0

So my users wrote their addresses in a registration form, but a lot of them have typos. I have another list retrieved from the city records with the correct spelling of those addresses. So let's say I have "Brooklny" typed by them and I have the list of correct names: Brooklyn, Manhattan, Bronx, Staten Island, Queens (this is an example, the actual addresses are in Spanish and refer to neighborhoodS in Mexico City).

I want to find the edit distance between Brooklyn and each of the borough names and then find the word to whick Brooklyn has the minimum edit distance.

So edit distance between: Brooklny-Brooklyn is 2, Brooklny-Bronx is 4 and so on. The minimum of course is 2 with Brooklyn.

Imagine that I have Brooklny in cell A1 and Brooklyn, Manhattan, Bronx, Staten Island, Queens each in a cell from B1:B6

Im doing this in VBA for a user defined function in Excel and so far I have this code but it doesnt work.

Function Minl(ByVal string1 As String, ByVal correctos As Range) As Variant

Dim distancias(3) As Integer
Dim i, minimo As Integer
i = 0
For Each c In correctos.Cells
    distancias(i) = Levenshtein(string1, c.Value)
    i = i + 1
Next c

Minl = Minrange(distancias)

End Function

Function Levenshtein(ByVal string1 As String, ByVal string2 As String) As Long

Dim i As Long, j As Long
Dim string1_length As Long
Dim string2_length As Long
Dim distance() As Long

string1_length = Len(string1)
string2_length = Len(string2)
ReDim distance(string1_length, string2_length)

For i = 0 To string1_length
distance(i, 0) = i
Next

For j = 0 To string2_length
    distance(0, j) = j
Next

For i = 1 To string1_length
    For j = 1 To string2_length
        If Asc(Mid$(string1, i, 1)) = Asc(Mid$(string2, j, 1)) Then
            distance(i, j) = distance(i - 1, j - 1)
        Else
            distance(i, j) = Application.WorksheetFunction.Min _
            (distance(i - 1, j) + 1, _
            distance(i, j - 1) + 1, _
            distance(i - 1, j - 1) + 1)
        End If
    Next
Next

Levenshtein = distance(string1_length, string2_length)

End Function

Function Minrange(ParamArray values() As Variant) As Variant
Dim minValue, Value As Variant
minValue = values(0)
For Each Value In values
   If Value < minValue Then minValue = Value
Next
Minrange = minValue
End Function

I think the algorithm is right but I think I might be having trouble with the syntax. The levenshtein function works but im not sure about the other two.

1 Answers1

0

To get the closest output you could use this:

Function get_match(ByVal str As String, rng As Range) As String
  Dim itm As Variant, outp(0 To 2) As Variant
  outp(1) = 0: outp(2) = ""
  For Each itm In rng.Text
    outp(0) = Levenshtein(itm, str)
    If outp(0) = 0 Then
      get_match = itm
      Exit Function
    ElseIf outp(1) = 0 Or outp(0) < outp(1) Then
      outp(1) = outp(0)
      outp(2) = itm
    End If
  Next
  get_match = outp(1)
End Function

to get the distance later, you simply could run an Levenshtein(string,get_match(string,range))

Still... I'm not exactly sure what you are looking for :/

Dirk Reichel
  • 7,989
  • 1
  • 15
  • 31