Complexity of computing the similarity between two sequences

Question

What is the computational complexity of the best known algorithm for computing the similarity between two sequences (as in DNA or Protein alignment/approximate string matching)?

The similarity is based on:

scoring the alignment using substitution scoring matrices (for either global or position-specific substitutions of 20 symbols in Protein alphabet or 4 symbols in DNA alphabet)
Gap Penalty

Is the linear time of Burrows–Wheeler transform used in Bowtie and BWA short-read aligners the actual state-of-the-art or are there sub-linear algorithms solving the same problem?

[Edit]: Thinking of applying LSH for approximate matching that will be sublinear assuming pre-processing/indexing of the reference dataset

How are you defining "similarity?" – templatetypedef Feb 09 '13 at 04:59 — templatetypedef, Feb 09 '13 at 04:59

score 1 · Answer 1 · answered Feb 09 '13 at 03:14

1

I guess at some point you end up reading the entire sequence so there cannot be a sub-linear time algorithm.

answered Feb 09 '13 at 03:14

zad

3,355
2
24
25

what about using bitwise operators – alex Feb 09 '13 at 03:23
1

If you are measuring complexity in terms of the sequence length using bitwise operators may be able to lower the constant but not the order. – zad Feb 09 '13 at 03:37

Complexity of computing the similarity between two sequences

1 Answers1