2

I have tried going through the theory in the paper http://webglimpse.net/pubs/suffix.pdf

But I am kind of lost when they say

Let Ai be the first suffix in the first bucket (i.e., Pos[0] = i), and consider Ai-h (if i-h < 0, then we ignore Ai and take the suffix of Pos[1], and so on). Since Ai starts with the smallest h-symbol string, Ai-h should be the first in its 2h-bucket.

I am not able to understand this statement. Why Ai-h can be ignored if i-h < 0. How the position is getting determined in const time when i-h > 0 in the phase 1?

One sample impl is http://belbesy.wordpress.com/2012/10/10/spoj-649-distinct-substrings-suffix-arrays-nlgn/

mkj
  • 2,761
  • 5
  • 24
  • 28
self_noted
  • 119
  • 1
  • 9

1 Answers1

2

I strongly recommend that, instead of trying to understand the C++ code, walk through this Python implementation of the Manbers-Myers suffix array construction algorithm , by hand, for a simple 5 character example.

Because the Python version is only about 15 lines of code, so it's pretty easy to follow.

Even if you don't understand Python, treat it as pseudocode and Google the syntax you don't understand.

Personally, I walked through one 5 character string by hand, and it was enough to help me to understand how the algorithm worked..

Gino
  • 1,593
  • 17
  • 22
  • 1
    The linked code **is not an implementation of the Manber-Myers algorithm**. "Manbers and Myers suggested an algorithm that is in principal an MSD radixort, but where the number of passes is reduced to at most log(n) **by taking advantage of the fact that each suffix is a prefix of another one: the order of the suffixes in the previous sorting pass is used as the keys for the preceding suffixes in the next pass**, each time doubling the number of considered symbols per suffix." ([Source](http://www.larsson.dogma.net/tr204.pdf)) -- Continued – nspo Mar 06 '21 at 18:58
  • (Continued) The logic in the code above is similar, but there is no usage of the order of the keys outside of the current bucket calculated in the previous step. Instead, in the line `for k, v in sorted(d.items()):`, the prefixes (up to the current limit) are sorted with string comparisons, possibly walking through many characters in each prefix - but this is not necessary as you could use the already calculated order from the previous step. (Continued) – nspo Mar 06 '21 at 19:00
  • (Continued) Just doubling the amount of evaluated characters in each step and putting that into a sorting algorithm does not lead to the same advantages as already knowing the relative order because it can be elegantly extracted from the results of the previous step. – nspo Mar 06 '21 at 19:00