-1

I have millions of words in list A and about 100 in list B. I would like to find all the items in set A that look like items in set B. I'm using the Python Levenshtein library, which is written in C, and it works quite well.

But 99% of comparisons will be a waste of time because Levenshtein calculates the distance of words like "apple" and "banana" even when it's quite clear the words don't look like each other.

What I'd like to do is end the function once the Levenshtein distance reaches 3. I can do that with a line in this python function I borrowed from somewhere. But I'd prefer to use the Levenshtein library, which is built in C and should be faster.

#!/usr/bin/python
def lvn(a, b):
    "Calculates the Levenshtein distance between a and b."
    n, m = len(a), len(b)
    if n > m:
          # Make sure n <= m, to use O(min(n,m)) space
          a,b = b,a
          n,m = m,n

    current = range(n+1)
    for i in range(1,m+1):
          previous, current = current, [i]+[0]*n
          for j in range(1,n+1):
            add, delete = previous[j]+1, current[j-1]+1
            change = previous[j-1]

            if a[j-1] != b[i-1]:
              change = change + 1
              current[j] = min(add, delete, change)
            if current[j]==3: return #what I want to replicate

        return current[n]
  • I _don't_ think that line works. And the indentation in your example is off. – musically_ut Sep 02 '15 at 11:30
  • Next time you post a question, include the details of what you are actually using, including a **link to the library you want help with**. I've now posted 3 times and do not have a link to the lib you are using. You are never going to get a good answer if you don't ask a good question. – mikeb Sep 02 '15 at 12:17

1 Answers1

0

Why don't you pass in a depth paramter:

def lvn(a, b, depth)

and increment it each time you recurse, and return 1000 when you get to 3?

mikeb
  • 10,578
  • 7
  • 62
  • 120
  • This function does work as it is - but I want to know if there is a way to use the Levenshtein library itself to do the same thing because my function is probably inefficient – user2998946 Sep 02 '15 at 11:38
  • Which library? There is a C one, a python one, a lisp one, a java one, a ruby one, and several of those have *more* than one implementation. – mikeb Sep 02 '15 at 11:46
  • The Python one - which is written in C, and so I don't understand it – user2998946 Sep 02 '15 at 11:59