1

The title pretty much.

I created a suffix array in O(n) time using the DC3 algorithm. I then created an LCP array using Kasai's algorithm in O(n) time. Now I need to create a suffix tree from the two arrays I have. How does one do that? I looked at journal papers and looked around using Google but I could not find a way to do it.

There is one coursera video that I came across that describes the process but they do not state what method they use and I doubt it is a linear time algorithm.

1 Answers1

2

It's actually very simple. The suffix array tells you the sequence of suffixes that you encounter if you do a left-to-right depth-first traversal of the suffix tree. The LCP array tells you how far up you need to go before starting a new edge corresponding to the next suffix. Assuming the string s has some unique character at the end (so each suffix is represented by a leaf node), the algorithm looks roughly like this:

let root = new node
let p = new node
make p a child of root with edge label S[0] (the least suffix)
for (int i = 1; i < s.size(); i++) {
    let l = LCP[i-1] (the LCP length of S[i] and S[i-1])
    let prevP = null
    while ((depth of p) > l) {
        // note that "depth" means total edge length from root to p, not
        // total number of edges from root to p
        prevP := p
        p := (parent of p)
    }
    if ((depth of p) == l) {
        let q = new node
        make q a child of p, with edge label S[i][l...]
        p := q
    } else {
        // we need to "split" the edge from p to prevP
        let q = new node
        let prevDepth = depth of prevP
        unlink prevP from p
        make q a child of p, with edge label S[i-1][(depth of p)...(l - 1)]
        make prevP a child of q, with edge label S[i-1][l...(prevDepth - 1)]
        let r = new node
        make r a child of q, with edge label S[i][l...]
        p := r
    }
}
return root
Brian Bi
  • 111,498
  • 10
  • 176
  • 312
  • Thank you for your answer! Does this algorithm have a name? – hussain sagar Aug 16 '19 at 18:23
  • @hussainsagar I have no idea. – Brian Bi Aug 16 '19 at 18:25
  • I have a doubt about what you mean by depth. You have said depth is the total number of edges from the root to p and not edge length. How are edge length and the number of edges different? – hussain sagar Aug 16 '19 at 19:23
  • @hussainsagar Sorry, when I talk about the length of an edge, I mean the length of the substring label that that edge carries. For the string "abcabd", the suffix tree would have a node that is linked to the root with an edge with label "ab". So that node's depth would be 2 since "ab" has length 2, even though it's only 1 edge. – Brian Bi Aug 16 '19 at 19:27