I want to make a messenger application and I want to filter an incoming String based on certain keywords. The language I am planning to use is Java but I can use Groovy too.
The keyword list will be static somewhere in a file or csv.
The keyword list size will be 100 words max (no way I will use more than 100 keywords).
The incoming string will be max 200 bytes (UTF-8)
I have seen quite a few posts saying that using keywords to filter a string is obsolete. The application I am planning to do will be simple so I don't want to mess with nlp.
Keywords may be regexes or normal words.
I know there are plenty of ways to do this but I want the fastest one. I have a read a good approach is to use HashMap but i don't see how this could be fast combined with regex.
For example an incoming string can be :
String example = "I want to gamble and drink vodka all day"
A keyword list will contain :
DRUGS
VODKA.?
GAMBLE
The example String should be filtered because it contains at least 1 words from the keyword list
EDIT*
After some replies pointing out that using regex is slow i want to find a good solution without regex.
Without using regex one of the ways to do it is to put the keywords in a set, Split the incoming string to an array then iterate over the array and check if any of the array words are contained in the set.
This will not work in some cases. For example someone can enter "I like to gambleand drinkvodka all day". This will not match.
That is one of the reasons I see regex as the only way to go with word filtering...