I am trying to solve the problem of largest common substring between 2 Strings. I will reduce my problem to the following: I created a general suffix tree and as per my understanding the largest common substring is the deepest path consisting of nodes that belongs to both strings.
My test input is:
String1 = xabc
String2 = abc
It seems that the tree I build is correct but my problem is the following method (I pass the root of the tree initially):
private void getCommonSubstring(SuffixNode node) {
if(node == null)
return;
if(node.from == ComesFrom.Both){
current.add(node);
}
else{
if(max == null || current.size() > max.size()){
max = current;
}
current = new ArrayList<SuffixNode>();
}
for(SuffixNode n:node.children){
getCommonSubstring(n);
}
}
What I was aiming to do is, in order to find the deepest path with nodes that belong to both strings, I would traverse the tree (pre-order) and add nodes that belong to both strings in a list (current
). Once I am in a node that is not part of both I update max
list if current
is bigger.
But the code is erroneous. And I am confused on how to implement this, since I haven't written code for general (non-binary) trees in ages.
Could you help me figure this out?
Update:
Modified as per @templatetypedef. Could not make this work either.
private void getCommonSubstring(SuffixNode node, List<SuffixNode> nodes) {
if(node == null)
return;
if(node.from == ComesFrom.Both){
nodes.add(node);
}
else{
if(max == null || current.size() > max.size()){
max = nodes;
}
nodes = new ArrayList<SuffixNode>();
}
for(SuffixNode n:node.children){
List<SuffixNode> tmp = new ArrayList<SuffixNode>(nodes);
getCommonSubstring(n, tmp);
}
}
public class SuffixNode {
Character character;
Collection<SuffixNode> children;
ComesFrom from;
Character endMarker;
}