Suffix Machine:
Suffix machine (or a directed acyclic graph of words) is a powerful
data structure that allows to solve many string problems.
For example, using the suffix of the machine, you can search for all
occurrences of one string into another, or to count the number of
different substrings of the given string - both tasks it can solve in
linear time.
On an intuitive level, suffix automaton can be understood as concise
information about all the substrings of a given string. An impressive
fact is that the suffix automaton contains all the information in such
a concise form, which for a string of length n it requires only a O(n)
memory. Moreover, it can also be built over time O(n) (if we consider
the size of the alphabet k constant; otherwise, during O (n log k)).
Historically, the first linear size suffix of the machine was opened in 1983 Blumer and others, and in 1985 - 1986 he was presented
the first algorithms build in linear time (Crochemore, Blumer and
others). For more detail see references at the end of the article.
In English the suffix machine called "suffix automaton" (in the plural
- "suffix automata"), and a directed acyclic graph of the words "directed acyclic word graph (or simply "DAWG").
The definition of the suffix automaton:
Definition. The suffix automaton for the given string s is called a
minimal deterministic finite automaton that accepts all suffixes of
the string s.
We will explain this definition.
- Suffix automaton is a directed acyclic graph, in which vertices are called States, and the arcs of the graph is the transitions between
these States.
- One of the States t_0 is called the initial state, and it must be the origin of the graph (i.e. it achievable for all other States).
- Each transition in the automaton is arc marked with some symbol. All transitions originating from any state must have different
labels. (On the other hand, may not be transitions for any
characters.)
- One or more of the conditions marked as terminal States. If we go from the initial state t_0 any way to any terminal state, and let us
write this label all arcs traversed, you get a string, which must be
one of the suffixes of the string s.
- The suffix automaton contains the minimum number of vertices among all the machines that satisfy the above conditions. (The minimum
number of transitions is not required because the condition of
minimality of the number of States in the machine may not be "extra"
ways - otherwise it would break the previous property.)
Elementary properties of the suffix automaton:
The simplest, and yet most important property of the suffix automaton
is that it contains information about all the substrings of the string
s. Namely, any path from the initial state t_0 if we write out the
labels of the arcs along this path, forms necessarily a substring of a
string s. Conversely, any substring of the string s corresponds to
some path starting in the initial state t_0.
In order to simplify the explanation, we will say that a substring
corresponds to the path from the initial state, the labels along which
form the substring. Conversely, we will say that any path corresponds
to one row which is formed by the labels of its arcs.
In each state machine suffix is one or more paths from the initial
state. Let's say that the state corresponds to the set of strings that
match all of these ways.
EXAMPLES:
