Say you have scanned a document with names on it. Due to mistakes in the scanning process, you want to look up the names in a dictionary. Therefore, you need a function that takes in a possible name and outputs a list with every possible string variation of the input within a Levenshtein-Distance of 1.
I modified an implementation (https://rosettacode.org/wiki/Levenshtein_distance#Python) but didn't get the right result, yet. Since Levenshtein implementations usually take in two strings and compare them to give out an int for the L-Distance, I am wondering how to change that to get the variations of one string?
def levenshteinVariation(n_possible):
m = n_possible
n = n_correct
d = []
for i in range(len(m)+1):
d.append([i])
del d[0][0]
for j in range(len(n)+1):
d[0].append(j)
for j in range(1,len(n)+1):
for i in range(1,len(m)+1):
if m[i-1] == n[j-1]:
d[i].insert(j,d[i-1][j-1])
else:
minimum = min(d[i-1][j]+1, d[i][j-1]+1, d[i-1][j-1]+2)
d[i].insert(j, minimum)
return d
The expected result would be a match in the dictionary to all variations within a L-Distance of 1.
for n_correct, n_possible in [('Marcus','Maacus'), ('David','Davide'), ('Steve', 'Steven')]:
print(f"{n_correct} found: {n_correct in levenshteinVariation(n_possible)}")
But I got:
Marcus found: False
David found: False
Steve found: False