1

I need to solve problem - find longest substring in two words with suffix tree. I built suffix for first and secod word, but how can I find longest substring in two words? Could you suggest a possible algorithm for solving this problem?

QuickDzen
  • 247
  • 1
  • 11

1 Answers1

1

The trick is to use a single suffix tree for both words:

  1. First use some non-strings character like $ or # or something (must not be part of any string) to join strings

    i.e. strings abra and abracadabra get joined to abra$abracadabra#

  2. Then build suffix tree from that.

  3. Now from leaves ending with $ climb up and mark nodes as part of word1

  4. Do the same for leaves ending with #, marking them as part of word2

  5. Now we can do simple DFS traversal from root, as longest sub-string will be some path from root (only checking nodes that are part of both words)

Complexity - O(a+b) (suffix tree building (if build fast way) + O(a+b) (dfs) = O(a+b)

Photon
  • 2,717
  • 1
  • 18
  • 22
  • 2
    A generalised suffix tree is the way to go, but DFS and skipping paths with the sentinel is not quite enough. You definitely don’t want to search past $ since that won’t be a shared string, but there are other strings that aren’t shared either. If the string is a$bbbb the longest string without $ is bbbb which only occurs once. You only want inner nodes. But there, the longest is bbb. You want inner nodes with leaves from both strings. A DFS identifying which nodes have children in both strings first, and then a DFS identifying repeats will do it, and still in O(a+b) – Thomas Mailund Oct 31 '21 at 05:20