2

From Data Structures and Algorithm Analysis in Java, Weiss:

explicit and implicit representation of the suffix tree for 'banana'

Weiss writes:

  1. In the leaves, we use the index where the suffix begins (as in the suffix array)
  2. In the internal nodes, we store the number of common characters matched from the root until the internal node; this number represents the letter depth.

My question: given the input string (e.g. 'banana') and the implicit representation of the suffix tree, what would a good algorithm for substring search look like? The algorithms I've seen assume a different representation of the tree. I'd like to do substring search without converting to a different tree representation.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
ivme
  • 548
  • 5
  • 14

1 Answers1

1

I've never seen this representation before. It's more common to represent the labels on the edges as pairs of integers delineating a range of characters from the original string, which would let you more easily determine what the characters on the edges are (you can just look back at the original string at those characters on an as-needed basis to see whether they match the substring you're looking at).

I'm fairly certain that this compressed representation is not good at matching substrings. In order to follow an edge, you need to know what characters are on that edge, but you can't tell what those characters are unless you scan over the characters of the original string to find something that could conceivably match. You could consider descending down into the subtree to find a suffix there and use that to reconstruct the characters, but this requires extra time and breaks the time bounds you'd expect of a suffix tree.

My best guess here is that the author is mistaken about how to represent a suffix tree in a small amount of space.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • I doubt the author is mistaken, but I may be asking the wrong question. It could be that this particular representation isn't good for matching substrings, as you say; Weiss is silent on that subject. Just wondering whether anyone out there knows if there is a good method that uses this representation. – ivme Aug 09 '17 at 20:44
  • @Chad Given the [Amazon reviews of the book](https://www.amazon.com/Data-Structures-Algorithm-Analysis-Java/dp/0132576279) I would not be surprised if this turned out to just be an error on the author's part. I've taught suffix trees in an [advanced data structures class](http://web.stanford.edu/class/archive/cs/cs166/cs166.1166/) and spent a lot of time reading up on them, and I've never seen this representation used anywhere before. – templatetypedef Aug 09 '17 at 21:05
  • Thanks. I'm glad that folks more knowledgeable than I also find this representation a little strange. – ivme Aug 09 '17 at 21:15