I need to solve problem - find longest substring in two words with suffix tree. I built suffix for first and secod word, but how can I find longest substring in two words? Could you suggest a possible algorithm for solving this problem?
Asked
Active
Viewed 171 times
1
-
1You mean longest common substring? – kiner_shah Oct 30 '21 at 09:35
1 Answers
1
The trick is to use a single suffix tree for both words:
First use some non-strings character like
$
or#
or something (must not be part of any string) to join stringsi.e. strings
abra
andabracadabra
get joined toabra$abracadabra#
Then build suffix tree from that.
Now from leaves ending with
$
climb up and mark nodes as part of word1Do the same for leaves ending with
#
, marking them as part of word2Now we can do simple
DFS
traversal from root, as longest sub-string will be some path from root (only checking nodes that are part of both words)
Complexity - O(a+b)
(suffix tree building (if build fast way) + O(a+b)
(dfs) = O(a+b)

Photon
- 2,717
- 1
- 18
- 22
-
2A generalised suffix tree is the way to go, but DFS and skipping paths with the sentinel is not quite enough. You definitely don’t want to search past $ since that won’t be a shared string, but there are other strings that aren’t shared either. If the string is a$bbbb the longest string without $ is bbbb which only occurs once. You only want inner nodes. But there, the longest is bbb. You want inner nodes with leaves from both strings. A DFS identifying which nodes have children in both strings first, and then a DFS identifying repeats will do it, and still in O(a+b) – Thomas Mailund Oct 31 '21 at 05:20