-1

Let U small alphabet 0, 1 or A, C, G, T, k <= n.

I want to find minimum Hamming distance between u = (u_1,...,u_k) and the contiguous subsequences of v = (v_1,...,v_n) of length k in time O(n log n).

Is it possible?

Thank you for any help!

G H
  • 147
  • 10
  • If this allows preprocessing of `v`, what would be the limits on time and space? Without considering `k` fixed, it may show up in the function describing the growth of complexity with problem size. See, for example, [Wikipedia on optimal string alignment](https://en.wikipedia.org/wiki/Hirschberg's_algorithm). (Salutations and aTdHvAaNnKcSe are not welcome here.) – greybeard Nov 16 '15 at 06:37
  • No, `k` not fixed. For example if `k = n/2` then time in Hirschberg's algorithm is `O(n^2)` – G H Nov 16 '15 at 06:43

1 Answers1

1

For the alphabet {1, -1}, multiply the polynomials

(u_k + u_{k-1} x + u_{k-2} x^2 + ... + u_1 x^{k-1})

and

(v_1 + v_2 x + v_3 x^2 + ... + v_n x^{n-1}).

The coefficient of x^i in the product is a simple affine function of the Hamming distance between u_1 ... u_k and v_{i-k+2} ... v_{i+1}.

We can encode other alphabets by embedding them to make the Hamming distances work out, e.g.,

A -> 0000
C -> 0011
G -> 0101
T -> 1001.
David Eisenstat
  • 64,237
  • 7
  • 60
  • 120