Given a "large" list of patterns and a "short" text, what is the best/fastest way to search/tag those patterns in the text, where we are trying to find the pattern as a substring of the text? If there are multiple matches of a pattern in a text, we want to ideally find all of them.
To be more specific, the texts are actually streaming queries and the patterns to look for are named entities. We need an entire pattern to match in full. Training a NER model to tag entities is not an option. By "big" list, I mean a few hundred thousand entities to look up. By "short" text, I mean an average of 10 words.
e.g. :
Text: the actor who plays the black widow in the avengers.
I am considering tries and FSTs. Trying to understand the pros and cons of both in this particular scenario. Any pointers would be appreciated.