Search for a sub-string in any position in a given array of strings using Trie

Question

I have an array of n strings. I want to select all the elements of the array that contains the given string. For example

input = "ra" array = ["abas", "aras", "as", "ask", "asi", "aso", "atras", "ram" ] output = ["aras", "atras", "ram"]

My solution is brute-force, O(array.length*pattern.length*text.length). I wonder, is any possible way to do this faster? Or somehow use Trie (which apparently works for search only from start (!not in any position))?

What you mean by O(n^3) here - what is n? In current formulation you can use any substring searching algorithm for single pattern, for example, Rabin-Karp or KMP one. Perhaps you have additional conditions - are you going to use the same array for many patterns? — MBo, Apr 07 '18 at 19:26
@Mbo, with Rabin-Karp or KMP still polynomial, is any way to reach linear? — Rami Chasygov, Apr 07 '18 at 19:44
KMP is linear (m + n) (pattern length+ full text length), Rabin-Karp is linear in average (while the worst case is O(nm)). You cannot expect for better performance in general case - you should at least read all pattern and text chars. — MBo, Apr 07 '18 at 20:03
@Mbo, what about Trie, is possible to modify it to search from any position, maybe use hash as keys? — Rami Chasygov, Apr 07 '18 at 20:17
If `input` string is short as in your example, then brute-force will beat any other solution because its operations are extremely simple. [KMP](https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm) usually outperforms brute-force in cases like yours, but you should test which one works faster with your data (as asymptotic time complexity and actual time are different things). — user3707125, Apr 08 '18 at 02:49
If you want to search inputs multiple times in the same array you can build a [suffix tree](https://en.wikipedia.org/wiki/Suffix_tree) of the array (similar to a trie but not limited to searching from start). Otherwise a string search algorithm (Boyer-Moore is sublinear) would suitable. — CoronA, Apr 08 '18 at 05:26
@CoronA, I'm not sure, that [suffix tree](https://en.wikipedia.org/wiki/Suffix_tree) can work for me, according to this [article](http://www.allisons.org/ll/AlgDS/Tree/Suffix/), if I type `si` it returns `sissippi, sippi`, but not a word `mississippi`. — Rami Chasygov, Apr 09 '18 at 11:14
That depends on what you store in the leafs of the tree. You assume that you store nothing and return the suffix. But you can store the whole word in the leaf value and return it. If your prefix prefixes multiple entries return the set of all reachable leaf values. — CoronA, Apr 09 '18 at 11:33
@CoronA, if I store words in leafs it's would be big space complexity, e.g. creating app english dictionary search, repeatedly storing hundreds of words in hundreds of suffixes — Rami Chasygov, Apr 10 '18 at 17:40
@CoronA, is exists a better way to optimize space complexity? — Rami Chasygov, Apr 10 '18 at 17:50
The overhead for the whole words should be a constant factor. Hundreds will not be a problem, millions will be. In this case you will have to persist the suffix tree. I think it would be best to add your additional requirements to your problem description. — CoronA, Apr 10 '18 at 18:06

Search for a sub-string in any position in a given array of strings using Trie

0 Answers0