I have this code which computes the Longest Common Subsequence between random strings to see how accurately one can reconstruct an unknown region of the input. To get good statistics I need to iterate it many times but my current python implementation is far too slow. Even using pypy it currently takes 21 seconds to run once and I would ideally like to run it 100s of times.
#!/usr/bin/python
import random
import itertools
#test to see how many different unknowns are compatible with a set of LCS answers.
def lcs(x, y):
n = len(x)
m = len(y)
# table is the dynamic programming table
table = [list(itertools.repeat(0, n+1)) for _ in xrange(m+1)]
for i in range(n+1): # i=0,1,...,n
for j in range(m+1): # j=0,1,...,m
if i == 0 or j == 0:
table[i][j] = 0
elif x[i-1] == y[j-1]:
table[i][j] = table[i-1][j-1] + 1
else:
table[i][j] = max(table[i-1][j], table[i][j-1])
# Now, table[n, m] is the length of LCS of x and y.
return table[n][m]
def lcses(pattern, text):
return [lcs(pattern, text[i:i+2*l]) for i in xrange(0,l)]
l = 15
#Create the pattern
pattern = [random.choice('01') for i in xrange(2*l)]
#create text start and end and unknown.
start = [random.choice('01') for i in xrange(l)]
end = [random.choice('01') for i in xrange(l)]
unknown = [random.choice('01') for i in xrange(l)]
lcslist= lcses(pattern, start+unknown+end)
count = 0
for test in itertools.product('01',repeat = l):
test=list(test)
testlist = lcses(pattern, start+test+end)
if (testlist == lcslist):
count += 1
print count
I tried converting it to numpy but I must have done it badly as it actually ran more slowly. Can this code be sped up a lot somehow?
Update. Following a comment below, it would be better if lcses
used a recurrence directly which gave the LCS between pattern
and all sublists of text
of the same length. Is it possible to modify the classic dynamic programming LCS algorithm somehow to do this?