I have millions of words in list A and about 100 in list B. I would like to find all the items in set A that look like items in set B. I'm using the Python Levenshtein library, which is written in C, and it works quite well.
But 99% of comparisons will be a waste of time because Levenshtein calculates the distance of words like "apple" and "banana" even when it's quite clear the words don't look like each other.
What I'd like to do is end the function once the Levenshtein distance reaches 3. I can do that with a line in this python function I borrowed from somewhere. But I'd prefer to use the Levenshtein library, which is built in C and should be faster.
#!/usr/bin/python
def lvn(a, b):
"Calculates the Levenshtein distance between a and b."
n, m = len(a), len(b)
if n > m:
# Make sure n <= m, to use O(min(n,m)) space
a,b = b,a
n,m = m,n
current = range(n+1)
for i in range(1,m+1):
previous, current = current, [i]+[0]*n
for j in range(1,n+1):
add, delete = previous[j]+1, current[j-1]+1
change = previous[j-1]
if a[j-1] != b[i-1]:
change = change + 1
current[j] = min(add, delete, change)
if current[j]==3: return #what I want to replicate
return current[n]