1

I am reading suffix array construction tutorials from codechef and stackoverflow as well. One point I could understand is that they say..

It works by first sorting the 2-grams(*), then the 4-grams, then the 8-grams, and so forth, of the original string S, so in the i-th iteration, we sort the 2i-grams

And so forth. Each iteration i has two steps:

Sorting by 2i-grams, using the lexicographic names from the previous iteration to enable comparisons in 2 steps (i.e. O(1) time) each

Creating new lexicographic names

MY DOUBT IS: How can I use the index computed at 2-grams for 4 - grams. ?

Suppose my 2 suffixes are 'ab', 'ac' how can you compare then in O(1) time and give them indexes.

I really tried but stuck there. Please provide some example , that helps . Than ks in advance

sad
  • 820
  • 1
  • 9
  • 16

1 Answers1

2

Let's assume that all substrings with length 2^k are sorted and now we want to sort all substrings with length 2^(k + 1). The key observation here is that any substring with length 2^(k + 1) is a concatenation of two substrings with length 2^k.
For example, in a string abacaba a substring caba is a concatenation of ca and ba.
But all substrings with length 2^k are sorted, so we may assume that each of them is assigned an integer from range[0 ... n - 1](I will call it class) based on it's position in the sorted array of all substrings with this length(equal strings should be assigned equal numbers and this array is not maintained explicitly, of course). In this case, each substring with length 2^(k + 1) can be represented as a pair of two numbers (p1, p2) - classes of the first and the second substring, respectively. So all what we need to do is to sort an array of pairs of integers from range [0 ... n - 1]. One can use radix sort to do in linear time. After sorting these pairs, we can find classes for all substrings with length 2^(k + 1) using single pass in the sorted array.

kraskevich
  • 18,368
  • 4
  • 33
  • 45