I am working with the following data type:
data SuffixTree = Leaf Int | Node [(String, SuffixTree)]
deriving (Eq, Show)
Each subtree has a corresponding label (string).
The idea is to build the corresponding suffix tree by adding each suffix and its index into an accumulating tree (at the beginning it is Node []
).
This is already defined
buildTree s
= foldl (flip insert) (Node []) (zip (suffixes s) [0..length s-1])
where suffixes
is correctly defined.
I've been trying to implement the insert
function for a while but can't seem to succeed.
This is what I have now (the names and style are not the best since this is still work in progress):
insert :: (String, Int) -> SuffixTree -> SuffixTree
insert pair tree@(Node content)
= insert' pair tree content
where
insert' :: (String, Int) -> SuffixTree -> [(String, SuffixTree)] -> SuffixTree
insert' (s, n) (Node []) subtrees
= Node ((s, Leaf n) : subtrees)
insert' (s, n) (Node content@((a, tree) : pairs)) subtrees
| null p = insert' (s, n) (Node pairs) subtrees
| p == a = insert' (r, n) tree subtrees
| p /= a = Node ((p, newNode) : (subtrees \\ [(a, tree)]))
where
(p, r, r') = partition s a
newNode = Node [(r, (Leaf n)), (r', tree)]
The partition
function takes two strings and returns a tuple consisting of:
- The common prefix (if it exists)
- The first string without the prefix
- The second string without the prefix
I think I understand the rules needed to build the tree.
We start by comparing the label of the first subtree to the string we want to insert (say, str
). If they don't have a prefix in common, we try to insert in the next subtree.
If the label is a prefix of str
, we continue to look into that subtree, but instead of using str
we try to insert str
without the prefix.
If str
is a prefix of label, then we replace the existing subtree with a new Node
, having a Leaf
and the old subtree. We also adjust the labels.
If we don't have a match between str
and any label then we add a new Leaf
to the list of subtrees.
However, the biggest problem that I have is that I need to return a new tree containing the changes, so I have to keep track of everything else in the tree (not sure how to do this or if I'm thinking correctly about this).
The code appears to be working correctly on this string: "banana"
:
Node [("a",Node [("",Leaf 5),("na",Node [("",Leaf 3),("na",Leaf 1)])]),
("na",Node [("",Leaf 4),("na",Leaf 2)]),("banana",Leaf 0)]
However, on this string "mississippi"
I get an Exception: Non-exhaustive patterns in function insert'
.
Any help or ideas are greatly appreciated!