Can we use circular strings with Suffix Trees?

Question

Can we use circular strings with Suffix Trees? So the last character is followed by the first in the list.

If so, how is the representation of this suffix tree different from a normal suffix tree?

I don't know about suffix trees, but suffix arrays for circular strings are used in [Burrows–Wheeler transform](http://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform). — Evgeny Kluev, Nov 15 '12 at 18:22
@icepack http://en.wikipedia.org/wiki/Linked_list#Circular_list — , Nov 15 '12 at 18:52
@icepack what the user means by circular strings is that the suffix trie of mississippi also considers pim, along with the other possibilities — Goaler444, Nov 15 '12 at 19:48

score 1 · Accepted Answer · answered Nov 16 '12 at 07:31

It depends on what you mean by "use".

1) Firstly, interpreting your question in the most direct possible way, consider a circular string of length n, i.e. an infinite string that repeats itself every n characters. Such an object has no suffixes in the usual sense of the word because it never ends, so you can't construct a suffix tree of it.

2) However, the idea certainly is that we have a finite representation of the circular string, which uses a link from the last character to the first. In a similar way, we can extend a given suffix tree through the use of links to a circular suffix tree that represents all the (infinitely long) suffixes of the circular string. Note that this cannot be done by inserting a link from each leaf to the root of the node, because from the root, there are outgoing edges for all suffixes of the string, but from a leaf of such a circular string, there can be only one outgoing edge. Example: The leaf that represents the suffix "ssippi$" of "mississippi$" should have an outgoing edge carrying an infinite label "mississippi$mississippi$mississippi$...." and no other edges. If you were to link it to the root of the tree, there would be many more incorrect continuations.

So two things are necessary:

Outgoing edges from leaves (which is a funny concept after all). Each leaf gets one outgoing edge.
Edges carrying infinite labels. This labels could be represented as circular strings (and that circular string would be the same for all outgoing edges of leaves).

This would give you a valid representation of all (infinite) suffixes of the circular string.

3) I am not really sure if that representation would be useful for anything. If the purpose of constructing the suffix tree is to enable substring searches, then the usual trick of concatenating the finite representation of the circular string (not including the link) to itself and constructing a suffix tree from this should suffice unless the substrings you search for are themselves longer than n characters.

It is also important to note that certain other uses of a suffix tree would require the introduction of further "infinite" concepts. For example, for certain applications it may be required to store the character depth of a tree node (i.e. the combined length of the edge labels leading from root to a particular node) in that node. In the "circular suffix tree" proposed above, the outgoing edges of leaves would lead to some sort of special "leaf in the limit" and carry a circular string as label. Any query that is matched into such a circular string would require a special way to keep track of the matching depth as there are no inner nodes on that edge to store the depth information.

4) Having said all of this, there is actually at least one known application of suffix trees to circular strings, but not in the sense of (1), (2) or (3) above, i.e. representing the entire infinite object through the use of a suffix tree. Rather, a suffix tree of a finite substring of the circular string is used to solve the problem of lexicographically minimal rotation. The problem is described on Wikipedia, although the solutions listed there do not include any that makes use of suffix trees. However, Dan Gusfield describes the solution in Algorithms on Strings, Trees and Sequences, in section 7.13.

The idea is that you consider the set of lexicographically minimal rotations of a string S of length n as equivalent to the set of the first length-n substrings of a circular string. The problem is then equivalent to that of finding a lexicographically minimal cut-off point. Gusfield solves it by constructing a suffix tree of the string SS$, traverses this tree by taking the lexicographically smallest edge at each node and thus ending up at a node that corresponds to the lexicographically smallest cut-off point.

So, as (4) demonstrates, there are certain practical "uses" of suffix trees in the context of circular strings, but I am unsure if that is the kind of use you are interested in.

Goaler444 · Answer 2 · 2012-11-15T19:45:45.270

0

Yes you can store circular strings given that the length of the strings are finite.

Lets consider a the word banban.

The following is the structure

root -> b -> a -> n -> b -> a -> n -> $

                    -> $

root -> a -> n -> b -> a -> n -> $

                -> $

root -> n -> b -> a -> n -> $

          -> $

The dollar sign represents that the termination of a suffix

EDIT:

A neat implementation of suffix trees using the Java Programming Language can be found here

EDIT: As asked in the comment section:

"What if I have the string mississippi and I want to search for 'pim'?"

pim is not a suffix of mississippi, and therefore the search will fail.

EDIT: But pim is in the circular string and I want to add it to the trie too

In order to do this, you must treat prim as a separate word and add it to the trie to form a global augmented suffix trie.

Consider anb to be in the circular string of the original word banban.

So the global augmented suffix trie would be:

root -> b -> a -> n -> b -> a -> n -> $ (original word)

          -> a -> n -> $ (original word)

          -> $ (from anb)

root -> a -> n -> b -> a -> n -> $ (original word)

               -> $ (original word)

               -> b -> $ (from anb)

root -> n -> b -> a -> n -> $ (original word)

                -> $ (from anb)

          -> $ (original word)

edited Nov 15 '12 at 19:45

answered Nov 15 '12 at 18:24

Goaler444

2,591
6
35
53

1

What if I have the string mississippi and I want to search for 'pim'? – Nov 15 '12 at 18:50
'pim' is not a suffix of 'mississippi' and therefore wont be in the tree. Thus it cannot be found. – Goaler444 Nov 15 '12 at 19:05
but it is in the circular string? – Nov 15 '12 at 19:29
You have to extract all the strings possible and then add them to the suffix tree individually, to from a global augmented suffix tree. – Goaler444 Nov 15 '12 at 19:31
Is it a good idea to hold a pointer in each leaf to the root? – Nov 15 '12 at 21:26
It wont work because every node would then have to lead to a $ node, Thus defeating the purpose – Goaler444 Nov 15 '12 at 22:01

score 0 · Answer 3 · answered Nov 15 '12 at 20:18

I would think about what you want this for, with the following workaround in mind:

If you had a suffix array of a circular string, this would mostly be a list of offsets inside the string, such that the sequences started at each offset were in sorted order.

Now suppose you had a circular string made by wrapping round ABCD. Consider the string formed by appending all but one of the characters of itself to it - ABCDABC, and what happens if you build a suffix array from it. All of the sequences in the circular string (ABCD BCDA CDAB DABC) appear inside ABCDABC, so when you build a suffix array from that you get the same suffix array as if you built it from a circular string, with some sequences having characters tacked on the end (ABCDABC instead of ABCD) and some extra sequences that are too short (ABC). You can recognise both of these cases just by looking at the length of the sub-sequence, or equivalently, its starting position within ABCDABC.

Clearly, you can find pim within mississippimississipp.

Can we use circular strings with Suffix Trees?

3 Answers3