-2

I was curious if anyone had a good method of choosing the best matching case between strings. For example, say I have a table with keys “Hi there”, “Hello”, “Hiya”, “hi”, “Hi”, and “Hey there”. The I want to find the closest match for “Hi”. It would then match to the “Hi” first. If that wasn’t found, then the “hi” then “Hiya”, and so on. Prioritizing perfect matches, then lower/uppercase matches, then which ever had the least number of differences or length difference.

My current method seems unwieldy, first checking for a perfect match, then looping around with a string.match, saving any with the closest string.len.

  • I think, you'll need the Levenshtein distance for this. [See here](https://stackoverflow.com/questions/42681501/how-do-you-make-a-string-dictionary-function-in-lua) –  Dec 11 '17 at 08:46

1 Answers1

0

If you're not looking for a perfect match only, you need to use some metric as a measure of similarity and then look for the closest match.

As McBarby suggested in his comment you can use the Levenshtein distance which is the minimum number of single character edits necessary to get from string 1 to string 2. Just research which metrics are available and which one suits your needs best. Of course you can also define your own metric.

https://en.wikipedia.org/wiki/String_metric lists a number of other string metrics:

Sørensen–Dice coefficient

Block distance or L1 distance or City block distance

Jaro–Winkler distance

Simple matching coefficient (SMC)

Jaccard similarity or Jaccard coefficient or Tanimoto coefficient

Tversky index

Overlap coefficient

Variational distance

Hellinger distance or Bhattacharyya distance

Information radius (Jensen–Shannon divergence)

Skew divergence

Confusion probability

Tau metric, an approximation of the Kullback–Leibler divergence

Fellegi and Sunters metric (SFS)

Maximal matches

Grammar-based distance

TFIDF distance metric

Piglet
  • 27,501
  • 3
  • 20
  • 43