So my users wrote their addresses in a registration form, but a lot of them have typos. I have another list retrieved from the city records with the correct spelling of those addresses. So let's say I have "Brooklny" typed by them and I have the list of correct names: Brooklyn, Manhattan, Bronx, Staten Island, Queens (this is an example, the actual addresses are in Spanish and refer to neighborhoodS in Mexico City).
I want to find the edit distance between Brooklyn and each of the borough names and then find the word to whick Brooklyn has the minimum edit distance.
So edit distance between: Brooklny-Brooklyn is 2, Brooklny-Bronx is 4 and so on. The minimum of course is 2 with Brooklyn.
Imagine that I have Brooklny in cell A1 and Brooklyn, Manhattan, Bronx, Staten Island, Queens each in a cell from B1:B6
Im doing this in VBA for a user defined function in Excel and so far I have this code but it doesnt work.
Function Minl(ByVal string1 As String, ByVal correctos As Range) As Variant
Dim distancias(3) As Integer
Dim i, minimo As Integer
i = 0
For Each c In correctos.Cells
distancias(i) = Levenshtein(string1, c.Value)
i = i + 1
Next c
Minl = Minrange(distancias)
End Function
Function Levenshtein(ByVal string1 As String, ByVal string2 As String) As Long
Dim i As Long, j As Long
Dim string1_length As Long
Dim string2_length As Long
Dim distance() As Long
string1_length = Len(string1)
string2_length = Len(string2)
ReDim distance(string1_length, string2_length)
For i = 0 To string1_length
distance(i, 0) = i
Next
For j = 0 To string2_length
distance(0, j) = j
Next
For i = 1 To string1_length
For j = 1 To string2_length
If Asc(Mid$(string1, i, 1)) = Asc(Mid$(string2, j, 1)) Then
distance(i, j) = distance(i - 1, j - 1)
Else
distance(i, j) = Application.WorksheetFunction.Min _
(distance(i - 1, j) + 1, _
distance(i, j - 1) + 1, _
distance(i - 1, j - 1) + 1)
End If
Next
Next
Levenshtein = distance(string1_length, string2_length)
End Function
Function Minrange(ParamArray values() As Variant) As Variant
Dim minValue, Value As Variant
minValue = values(0)
For Each Value In values
If Value < minValue Then minValue = Value
Next
Minrange = minValue
End Function
I think the algorithm is right but I think I might be having trouble with the syntax. The levenshtein function works but im not sure about the other two.