0

I have to compare a lot of enormous strings with each other and using algorithm like this:

def distance(a, b):
    "Calculates the Levenshtein distance between a and b."
    n, m = len(a), len(b)
    if n > m:
        # Make sure n <= m, to use O(min(n,m)) space
        a, b = b, a
        n, m = m, n

    current_row = range(n+1) # Keep current and previous row, not entire matrix
    for i in range(1, m+1):
        previous_row, current_row = current_row, [i]+[0]*n
        for j in range(1,n+1):
            add, delete, change = previous_row[j]+1, current_row[j-1]+1, previous_row[j-1]
            if a[j-1] != b[i-1]:
                change += 1
            current_row[j] = min(add, delete, change)

    return current_row[n]

After a 2-3 hours wait, I decided to stop the script and found Theano.

How can I implement this function with Theano?

  • How "enormous" are your strings? Note that this is a quadratic algorithm. Running it on a pair of long strings may take a lot of time, especially if you use CPython to run it. – user172818 Jul 29 '16 at 00:44
  • using python for this purpose is not the wisest way to go. If you are after speed and efficiency I would recommend to go for using statically typed languages, such as Java, C++, Go, C etc. – Yerken Jul 29 '16 at 04:47

0 Answers0