Motifs with non-standard/biased compositions – similarities

The division into standard and non-standard/biased residue compositions

Developers of methods for identification of unusual protein motifs based on their compositions distinguished homopolymers, short tandem repeats, low complexity regions, and compositionally biased regions. Homopolymers and short tandem repeats are sequences containing short repeating patterns. Low complexity regions relax this terminology to include motifs composed of a few residues allowing for their irregular arrangement. Compositionally biased regions allow for high diversity of amino acid composition, but at least one of them must be overrepresented. This approach is useful to check if these unusual compositions have a biological meaning.

However, when searching for similarities, these sequence features are becoming less important and our goal is rather to find sequences which may share similar function or be homologous. Nevertheless, the state-of-the-art approach to amino acid importance in protein domains says that rare (costly) residues, such as tryptophan, are more important in domain’s function than frequent (cheap) residues. On the other hand, we have for instance collagen proteins in which frequent residues, which are glycine and frequently proline, play key role in their function. When searching for similarities to a protein motif, we would like to ask whether frequently occurring residue play a key role in its function or not. Consequently, the division into standard and non-standard compositions seems to be more rational than the division into high complexity regions, compositionally biased regions, low complexity regions, short tandem repeats and homopolymers.

Which method should I select to search for similar motifs?

Protein domains are not yet explored enough to determine this automatically from protein sequence. However, some guidelines may be provided. First, we need to know why we want to search for similarities.

If you want to search for similarities to a whole protein sequence then I recommend to use one of the local alignment based method like BLAST, hhblist or HMMER. If the sequence contain functionally important fragments with non-standard compositions then I recommend to use BLAST with tuned parameters. If you want to discover more distant homologies then a method based on profiles of hidden Markov model may be a good choice.

When searching for similar domains, it has been proposed that methods that seek to include all query sequence features in hits should perform better than local alignment based methods. Therefore, I would recommend GLOBAL tool, when analysing motifs with standard compositions and NSC-Search when analysing motifs with non-standard compositions.


Posted

in

by

Leave a Reply

Your email address will not be published. Required fields are marked *