7

Basically I'm just trying to find a way to find the closest match (not necessarily exact) of a String

For example, find "delicous" in {"pie", "delicious", "test"}

This is pretty obvious, but the values in the array might not always be that distinct.

Could someone please help me with a way to achieve this.

wattostudios
  • 8,666
  • 13
  • 43
  • 57
Alex Coleman
  • 7,216
  • 1
  • 22
  • 31

3 Answers3

19

Depends on how you define "closest" but one common way is by using a Levenshtein Distance score. Apache Commons has such a method in StringUtils.

From there your search method basically becomes: find the string in the collection which has the smallest Levenshtein distance for a given input.

Andrew White
  • 52,720
  • 19
  • 113
  • 137
  • Thanks, this is working pretty well :) If I have chocolatedessert and chocolatepie and I type chocolatedes it seems to go for pie still, but it's still much better than before :p Thanks a ton! – Alex Coleman May 31 '12 at 03:28
  • Second link is broken. Please update. (this is possibly the link - https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#getLevenshteinDistance(java.lang.CharSequence,%20java.lang.CharSequence) ) – Kazekage Gaara Feb 14 '16 at 00:44
2

There's nothing built into Java for that. You might try a third-party library like SecondString or FREJ.

Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
2

Another approach that can be used in conjunction with Levenshtein Distance is taking the phonetic representation of the words first. One algorithm to do this is Metaphone.

The user guide for Apache Commons Codec has details of this and some other encoders.

Greg Kopff
  • 15,945
  • 12
  • 55
  • 78