0

I am aware of possible usage of Lucene and Solr, but is there any simple Java library that is just doing the fuzzy full text search part like e.g.:

SomeScore score = fuzzyFullTextSearch(String text, String searchTerm, int maxDistance)

where ''score'' determines the measure, how frequent the (fuzzy) searchTerm was found and how similar it was to the original searchTerm.

The reason why I'm not using Lucene or similar, is the fact that it is to bulky for my use case and I need the search only once. The maxDistance for Edits using Lucene's FuzzyQuery is only 2, too, which is not good enough for my special use case.

Is there a lightweight library that can achieve sth. like shown above?

eSKape
  • 71
  • 11

1 Answers1

2

As usual Apache Commons comes for the rescue.

org.apache.commons.lang3.StringUtils has plenty of methods for getting fuzzyDistance, levenshteinDistance, and some more complex metrics

So, naive pseudocode will be something like this:

split the text into tokens by spaces, commas, etc.
for each token
    calcDistanceBetweenTokenAndSearchTerm
getSumScore // or avg or whatever

Another approach could be to use commons-text org.apache.commons.text.similarity.FuzzyScore which is capable of calculating this distance between two strings, but of course a lot depends on exact requirements.

I'm not saying this is full coverage of the possible answers, but you could give it a try.

Mysterion
  • 9,050
  • 3
  • 30
  • 52
  • so I assume that there is no open library yet, even though it seems to be a very common use case to do this without using some indexing technologies like Lucene (even if it is more effective). Maybe I will provide a library in the future – eSKape Jan 13 '17 at 07:37