Given a text (T) and a dictionary (D), how can I find all words that occur in the text?
A1. One can assume that there are just few repetitions of characters in T, for example, the T is in Chinese.
A2. Iterating over the D, as one may suspect, is costly. Thus it either should be preproccessed, broken down or simply: Multiple iterations should be avoided.
A3. The upper length of a word is L and comparatively small compared to the text.
B1. The simplest solutions might be to just iterate over D for every substring of sensible length I have in my T. This method would definitely guarantee me that all words are found. This however seems vastly inefficient.
B2. Another idea would be to iterate over the text once, retrieve all characters in T in a set and proceed as in B1 for finding all words.
B3. This variation could work like B2, however would use/assert that D is in lexicographical order. That means, it would actually only check words with the same starting characters. Possibly I could also use a look ahead of the characters following my current character T. I would iterate over the D just once, over T multiple times. This however seems bearable.
B4. Here, I would also proceed like in B3, however re-order D in such a way, that more likely occurring words T are checked earlier. The problem here: How do I find out, what words are occurring more likely? I would have to first digest a lot more data beforehand and be then sure that what I then measure, is actually what I want to measure..
Surely, there are many other possibilities, likely more sophisticated ones. But what is the current state of the art? How can one do this / approach this problem best?