3

I am trying to write suffix tree class with a naive building algorithm which is O(m^2) and not Ukkonen.

My doubt is regarding how to represent the tree. So far, I have got this. But I do not think that is the write class structure for nodes and edges. Any suggestion regarding how the mapping/relationship between nodes and edges should be maintained. Important point is that in edge we just store start and end index to save space.

class suffixTree{
Node[] nodes;
Edge[] edge;
}

class Node{
  int node_id;
  Edge[] outgoingEdges;
}

class Edge{
  int start_index;
  int end_index;
  int start_node_id;
  int end_node_id;
}
Andy897
  • 6,915
  • 11
  • 51
  • 86
  • I don't see why you need `nodes` and `edges`. You should just need a reference to the root node. Every leaf node should contain a reference to its offset in the string. `Edge.start_node_id` is unnecessary. – Niklas B. Feb 24 '15 at 12:56
  • Thanks a lot for replying. Can you please elaborate a bit. May be with some class structure. – Andy897 Feb 24 '15 at 14:44
  • I told you the changes I would make to the class structure. The rest is okay. – Niklas B. Feb 24 '15 at 16:20

1 Answers1

3

I would do it this way:

class SuffixTree {
    // A tree only needs to know about the root Node.
    Node root;
}

// A Node contains a mapping from the first character to
// an Edge that corresponds to it.
// It doesn't need to know about anything else.
class Node {
    Map<Character, Edge> edgeMap;
}

// An Edge contains only the start and the end index of the substring
// and the destination Node.
class Edge {
    int startIndex;
    int endIndex;
    Node destination;
}

The most important changes are:

  1. Getting rid of redundant information in all three classes.

  2. Using references instead of arrays and indices.

kraskevich
  • 18,368
  • 4
  • 33
  • 45