NSC–Search

Description

NSC-Search is a tool for searching for motifs with non-standard compositions including short tandem repeats, low complexity regions and compositionally biased regions. State-of-the-art methods for protein similarity search lack of support for these motifs focusing mainly on homology searches. To handle motifs with non-standard compositions, NSC-Search offers three alignment strategies to compare motifs which are: local, global-to-local and global. It also introduces a new similarity score metric to compare alignments.

Local alignment

This type of alignment aims to find local similarities between two protein sequences and it is especially valuable when searching for homologous proteins. The approach is particularly useful when the sequences vary significantly in length or when only certain parts of the sequences are homologous.

Global–to–local alignment

Also called global-local and a type of semi-global alignment, it aligns whole query sequence to a local similarity of a database sequence. One may use this type of alignment either with a protein sequence fragment of interest as a query or with fragments identified using tools for different types of motifs with non-standard compositions (STRs, LCRs, and CBRs). This type of alignment penalizes hits which lack all query sequence features applying gap open and gap extend penalties.

Global alignment

It aligns two protein sequences requiring all features of both comparing sequences to be included in the final alignment. Both query and database sequences can be motifs retrieved from protein sequences manually or using dedicated tools for their identification.

Similarity score metric

This metric determines similarity between protein sequences by normalizing raw alignment score. It first perform self-alignment of a query sequence and then use it to normalize alignment score of interest. It can be calculated as S” = s / S where lower-case s is an alignment score and capital S is self-alignment of a query sequence.

Installation

Official version of the NSC-Search can be downloaded from the method’s github release page. The latest version tarball (.tar.gz file) is recommended. You can extract the archive using the following command in which you have to adjust version number of the downloaded file:

tar zxvf nscsearch-1.0.0.tar.gz

To compile the project you need to install dependencies. In a debian-like distribution you can execute the following command:

sudo apt install g++ swig python3-dev

Then go into the project directory:

cd nscsearch-1.0.0

Now, you can configure, build, and install the tool as any other open source project managed by autotools. To configure and build the project use:

./configure
make

If you want to install the tool in your operating system use:

sudo make install

Usage

If you decided to install the software, you can use it using the following syntax:

nscsearch -q [query_file_fasta] -d [database_fasta]

Otherwise, you have to provide path to the executable that is stored in the current directory. For instance:

./nscsearch -q [query_file_fasta] -d [database_fasta]