1

I want to match and then later replace the string to the closest match. I am using the stringdist library. Below is my code

stringdistmatrix("2 ltr thums up", c("solar thyme 30g", "Thums Up 2 L"), method = "lv")

It gives the output as below:

[,1] [,2]
 8   12

It means that "solar thyme 30g" is closer to "2 ltr thums up" but in reality "Thums Up 2 L" should be closer. Shall I change the levenshtein method to something else?

nk23
  • 179
  • 1
  • 10
  • 1
    you can try Jaro-Winkler distance (`method = "jw"`) which I often find a better metric when trying to match strings, and is scaled so that a match = 1 and a total non-match = 0. However, there is simply no magic way to make any of these functions reliably link character strings by their underlying referents. If you have more data for each string you can try a record-linkage approach. – gfgm Feb 15 '19 at 11:37

1 Answers1

1

I tried the method = 'cosine' and the output looks fine.

nk23
  • 179
  • 1
  • 10
  • 1
    "Looks fine" how? More information on what you tried and what your criteria are will make this helpful to other users in the future – camille May 03 '19 at 23:04