I would like to check if a keyword string is contained within a text string. This must be a fuzzy contains.
My first attempt was to use the library fuzzywuzzy. This seemed to have unexpected behavior producing high match values when the strings differed quite a lot when using the partial ratio.
I've tried using levenshtein's distance which works for comparing one string to another but not for finding if a string contains a keyword.
One idea I tried was to split the text into individual words and then loop through them all calculating the distance to see if there is a match. The problem is that the keyword may have white space in it which means it wouldn't find any matches using this method.
I've now tried using a Bitap algorithm to find if the keyword is in the text but this come back as true when the keyword and text are very different. The algorithm can be found here.
final String keyword = "br0wn foxes very nice and hfhjdfgdfgdfgfvffdbdffgjfjfhjgjfdghfghghfg".toLowerCase();
final String text = "The Quick Brown Fox Jumps Over the Lazy Dog".toLowerCase();
final Bitap bitap = new Bitap(keyword, alphabet);
bitap.within(text, 20); // Returns true
I've looked into using Lucene. The problem with this is that a lot of it is based around creating indexes from all the data and then performing the search. In my case this can't be done as it needs to be a method that takes a keyword and text separately. If there are any resources to do with performing a fuzzy contains without indexing using Lucene it would be very useful.
What is the best approach for this?