The CB-Search method has officially been released. It includes several embedded features that improve the search for similar motifs with biased compositions. The most crucial of these is the similarity measure used. Instead of checking how many times a sequence with a similar score may occur in a database, we analyse whether sequences are compositionally similar through a 2-mer filter and how the residues are arranged using one of the available alignment algorithms. We then check their similarity independently of their length. The motivation for detaching from length comparison is rooted in examples such as collagen—regardless of (to some extent) how many triplets are available, the protein may still form a collagen structure.
Despite these advancements, there is still a plethora of work to be done on the method, even though it already pushes the research far beyond other methods for protein sequence similarity analysis. One of the major areas is a new amino acid scoring matrix—or a completely new approach to scoring—along with a method for its adjustment. How amino acids are scored is fundamental to assessing the similarity of protein sequences.
CB-Search: https://doi.org/10.1186/s12859-026-06509-w
Scoring matrix challenges: https://doi.org/10.1038/s41598-024-82548-8
GitHub: https://github.com/patryk-jarnot/cb-search

Leave a Reply