I'm working on an implementation of Ukkonen's linear time suffix tree construction algorithm, and planning to implement improvements suggested by e.g. Kurtz and NJ Larsson (for example edge links instead of suffix links).
While testing, I experienced mixed green and red lights based on the specific strings I tested, and had similar experiences with a few algorithms I found online. Which made me wonder:
Are there any known, specifically built (preferably simple/short) strings for unit-testing suffix trees to ensure the algorithm works precisely in all branching scenarios?
Furthermore, are there any good methods to separate the testing of the tree building algorithm from the testing of the traversal/lookup algorithm?
I know this question doesn't have a single specific correct answer, but I think it could serve as a good reference point for people working on similar algorithms.
My current unit-testing approach is quite primitive (C# with NUnit):
[TestCase]
public void Contains_Simple_ShouldReturnTrue()
{
var s = "bananasbanananananananananabananas";
var st = SuffixTree.Build(s);
var t1 = s.Substring(0, 10);
Assert.IsTrue(st.Contains(t1));
}
// ... Other simple test cases
[TestCase]
// This test fails, but it's not particularly helpful for bugfixing
public void Contains_DynamicBarrage_OnLongString_ShouldReturnTrue()
{
const int CYCLES = 200,
MAXLEN = 200;
var s = "olbafuynhfcxzqhnebecxjrfwfttw"; // Shortened for sanity
var st = SuffixTree.Build(s);
var r = new Random();
for (int i = 0; i < CYCLES; i++)
{
var pos = r.Next(0, s.Length - 2);
var len = r.Next(1, Math.Min(s.Length - pos, MAXLEN));
Assert.IsTrue(st.Contains(s.Substring(pos, len)));
}
}