3

What is the computational complexity of the best known algorithm for computing the similarity between two sequences (as in DNA or Protein alignment/approximate string matching)?

The similarity is based on:

  1. scoring the alignment using substitution scoring matrices (for either global or position-specific substitutions of 20 symbols in Protein alphabet or 4 symbols in DNA alphabet)

  2. Gap Penalty

Is the linear time of Burrows–Wheeler transform used in Bowtie and BWA short-read aligners the actual state-of-the-art or are there sub-linear algorithms solving the same problem?

[Edit]: Thinking of applying LSH for approximate matching that will be sublinear assuming pre-processing/indexing of the reference dataset

alex
  • 1,757
  • 4
  • 21
  • 32

1 Answers1

1

I guess at some point you end up reading the entire sequence so there cannot be a sub-linear time algorithm.

zad
  • 3,355
  • 2
  • 24
  • 25
  • what about using bitwise operators – alex Feb 09 '13 at 03:23
  • 1
    If you are measuring complexity in terms of the sequence length using bitwise operators may be able to lower the constant but not the order. – zad Feb 09 '13 at 03:37