Problem
Given a list of string find the strings from the list that appear in the given text.
Example
list = ['red', 'hello', 'how are you', 'hey', 'deployed']
text = 'hello, This is shared right? how are you doing tonight'
result = ['red', 'how are you', 'hello']
'red' because it has 'shared' has 'red' as a substring
- This is very similar to this question except that the word we need to look for can be substring as well.
- The list is pretty large and increases with increase in the users as opposed to text which is pretty much the same length throughout.
- I was thinking of having a solution where the time complexity depends on from the length of the text rather that list of words so that it can be scalable even when lots of users are added.
Solution
- I build a trie from the give list of words
- Run dfs on the text and check the current word against the trie
Psuedo-Code
def FindWord (trie, text, word_so_far, index):
index > len(text)
return
//Check if the word_so_far is a prefix of a key; if not return
if trie.has_subtrie(word) == false:
return
//Check if the word_so_far is a key; if ye add to result and look further
if trie.has_key(word) == false:
// Add to result and continue
//extend the current word we are searching
FindWord (trie, text, word_so_far + text[index], index + 1)
//start new from the next index
FindWord (trie, text, "", index + 1)
The problem with this is although the runtime now depends on the len(text)
it runs with a time complexity O(2^n)
after building the trie which is a one time thing for multiple texts so it's fine.
I do not see any overlapping subproblems either to memoize and improve the runtime.
Can you suggest any way I can achieve a runtime that depends on given text as opposed to the list of words which can be per-processed and cached and is also faster that this.