10

Is there a fast (O(1) time complexity) way of generating a suffix tree of string S[2..m] from suffix tree of string S[1..m]?

I am familiar with Ukkonen's, so I know how to make fast suffix tree of string S[1..m+1] from suffix tree of string S[1..m], but I couldn't apply the algorithm for reverse situation.

keelar
  • 5,814
  • 7
  • 40
  • 79
user2356167
  • 101
  • 1
  • 2
  • 2
    I guess not. Basically what we need to do is to delete the string[1..m] in the suffix tree of S[1..m]. What makes you think that there exists constant time complexity algorithm? – Faraway May 06 '13 at 22:36
  • 2
    If I am not mistaken, the difficulty is to identify which leaf node corresponds to `S[1..m]`. Once you have the leaf, I think (but haven't tried to actually write down a proof) that removing that leaf and (if necessary) the internal node that points to it should be O(1). Finding the leaf is O(m), but you could use O(1) extra space to maintain a pointer to the deepest leaf in the tree, which would reduce the leaf-finding time to O(1). After deleting the leaf, you'd have to update that pointer, but that can be done in O(1) amortized time if you have suffix links in the tree. – jogojapan May 07 '13 at 05:15
  • Ukkonen's algorithm achieved O(n) complexity by building suffix tree from left to right. Algorithms prior to that are building it from right to left, and all failed to achieve O(n). So I guess not. – Chen Pang Sep 23 '13 at 00:41

1 Answers1

1

Well, as @jogojapan says, to get the S[2..m] tree from the S[1..m] tree we need to:

  • Find the position-0 leaf L.
  • If L has more than one sibling, delete the pointer from L's parent to L
  • If L has exactly one sibling, change the pointer from L's grandparent to L's parent so it instead points to L's sibling.

@jogojapan further suggests that you keep a pointer to the deepest leaf in the tree. There are two problems with that: L isn't necessarily the deepest leaf in the tree, as Wikipedia's example shows, and second if you want to be able to output the same type of data structure as you received, once removing L you need to find the new position-0 leaf, which will take O(m) time anyway.

(What you could do is construct an array of pointers to each leaf in O(m) time and count-sort them by position in another O(m) time. Then you'd be able to construct all the trees { S[t..n] : 1 <= t <= m } in constant amortized time each)

Assuming you're not interested in amortized time though, let's prove what you ask is impossible.

  • We know any algorithm to modify the suffix tree of S[1..m] must start at the root: it can't start anywhere else because we know nothing about the underlying concrete data structure, and we don't know that the tree's nodes have parent pointers, so the only position the whole tree is accessible from is the root.
  • We also know that it must locate the position-0 leaf before it can hope to modify the data structure into the suffix tree for S[2..m]. To do this, it must obviously traverse every node between the root and the position-0 leaf.
  • Thing is, consider the suffix tree of a^m (the character a repeated m times): the length of the path is m-1. So any algorithm must visit at least m-1 nodes, and therefore take O(m) time in the worst case.
Andy Jones
  • 4,723
  • 2
  • 19
  • 24