-3

I am studying some Suffix tree implementation and here is one reference implementation, and question is how "indexes" (refer line 19) is used for class SuffixTreeNode? I am not sure if "indexes" is useful and I think probably we just need to keep all nodes and their children character value? Not find too much values of "indexes" is used for class SuffixTreeNode.

Please feel free to correct me. Any insights are appreciated.

public class SuffixTree {
    SuffixTreeNode root = new SuffixTreeNode();
    public SuffixTree(String s) {
        for (int i = 0; i < s.length(); i++) {
            String suffix = s.substring(i);
            root.insertString(suffix, i);
        }
    }

    public ArrayList<Integer> getIndexes(String s) {
        return root.getIndexes(s);
    }
 }

public class SuffixTreeNode {
    HashMap<Character, SuffixTreeNode> children = new
    HashMap<Character, SuffixTreeNode>();
    char value;
    ArrayList<Integer> indexes = new ArrayList<Integer>();
    public SuffixTreeNode() { }

    public void insertString(String s, int index) {
        indexes.add(index);
        if (s != null && s.length() > 0) {
            value = s.charAt(0);
            SuffixTreeNode child = null;
            if (children.containsKey(value)) {
                child = children.get(value);
            } else {
                child = new SuffixTreeNode();
                children.put(value, child);
            }
            String remainder = s.substring(1);
            child.insertString(remainder, index);
        }
    }

    public ArrayList<Integer> getIndexes(String s) {
        if (s == null || s.length() == 0) {
            return indexes;
        } else {
            char first = s.charAt(0);
            if (children.containsKey(first)) {
                String remainder = s.substring(1);
                return children.get(first).getIndexes(remainder);
            }
        }
        return null;
    }
}

public class Question {
    public static void main(String[] args) {
        String testString = “mississippi”;
        String[] stringList = {“is”, “sip”, “hi”, “sis”};
        SuffixTree tree = new SuffixTree(testString);
        for (String s : stringList) {
            ArrayList<Integer> list = tree.getIndexes(s);
            if (list != null) {
                System.out.println(s + “: “ + list.toString());
            }
        }
    }
}
River
  • 8,585
  • 14
  • 54
  • 67
Lin Ma
  • 9,739
  • 32
  • 105
  • 175
  • I ran your code it's working as designed. If you had a bug there let's say you werent decreasing the string you would get a stackoverflow exception because you would never hit your recursive base case and get stuck in a forever loop – Marquis Blount Oct 07 '15 at 07:07

1 Answers1

2

indexes is surely needed for the implementation you are looking at of Suffix Tree (there are multiple versions of suffix tree some more optimized than others). The indexes variable plays an integral part in returning the indices where the sub-string (is, sip, hi, sis) exist in the original string (mississippi) back to the calling method. getIndexes returns indexes in its base case this is how you get the list of occurrences of each sub-string. see below output

is: [1, 4]
sip: [6]
sis: [3]
Marquis Blount
  • 7,585
  • 8
  • 43
  • 67
  • Thanks Marquis, what do you mean in its base case? – Lin Ma Oct 07 '15 at 00:13
  • Hi Marquis, wondering if the solution I posted has bug, which is for line 34, "child.insertString(remainder, index)", index should be index + 1? Thanks. – Lin Ma Oct 07 '15 at 06:08
  • 1
    @Lin Ma a base case is used in a recursive method. This is the condition that tells the recursive call to stop calling itself and return back up the stack. The base case is line 39 in your code. No there isn't a bug index is being incremented in the for loop on lines 4-6. – Marquis Blount Oct 07 '15 at 07:00
  • Thanks Marquis for the details, I do not quite understand the index issue. Suppose when inserting the whole string "mississippi", index is always 0 for all characters? And when we inserting suffix "ississippi" index is always 1 for all characters? I think index should be the position of the character in the original string, and should be increase when we build SuffixTreeNode for the next character in the suffix, wondering why value of index is the same for all characters of a specific suffix? Please feel free to correct me. Thanks. – Lin Ma Oct 07 '15 at 07:51
  • 1
    @LinMa take a look at the output that is generated, stepped through the code as if you were the compiler you will come to understand it. look at substring "is" it prints out indices at 1, and 4 which is correct – Marquis Blount Oct 07 '15 at 18:28
  • 1
    @LinMa I hate giving responses like that but; it really is true you have to convince/prove to yourself what the application is recursively doing. I would recommend looking into recursion problems there are many good tutorials on youtube. it will take multiple videos you wont get it after just one – Marquis Blount Oct 07 '15 at 19:36
  • Marquis, thanks and I did some work and want to confirm my understanding is correct, the index means the begin index of the match? For example, for output of sip, it is 6, the output is index is character 's', other than character 'p', correct? I may have the wrong understanding we should output index of character 'p', and this is why I am asking. :) – Lin Ma Oct 07 '15 at 20:41
  • 1
    Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/91668/discussion-between-marquis-blount-and-lin-ma). – Marquis Blount Oct 07 '15 at 21:30
  • 1
    @LinMa I didn't completely understand your explanation but it goes like this. Yes you are completely correct index means the beginning of a match. for sub-string "sip" the index is 6 because in the original word index 6 starts the beginning of a match namely in elements [6][7][8]. same for sub-string "is" it has indices 1 and 4 because in the original word you hit matches for that sub-string in those locations. If you have found my solution helpful and if it answered your original question dont forget to accept. let me know if you need anything else – Marquis Blount Oct 07 '15 at 21:41
  • Thanks Marquis, you have clarified all of my confusions. Mark as answered. Super cool. :) – Lin Ma Oct 08 '15 at 00:21