0

This question is based on this answer by jogojapan.

In that answer, he notes that for some suffix tree/suffix array algorithms, just having a unique sentinel character $ is sufficient, while others require $ to either lexicographically compare smallest/largest.

reading Abouelhoda et al.'s paper Replacing suffix trees with enhanced suffix arrays, they make the choice that $ must be larger than any other character. With this choice, the are able to construct efficient algorithms which can simulate both bottom-up and top-down suffix tree traversal, as well as various potential applications based on these traversal schemes.

On the other hand, algorithms for efficiently constructing the suffix array or LCP array using induced sorting make the opposite choice: $ must be lexicographically smallest. (see: Linear Suffix Array Construction by Almost Pure Induced-Sorting by Nong et al., and Inducing the LCP-Array by Johannes Fischer).

It's not immediately obvious to me if these choices for what properties $ has are necessary or were just done for convenience. It would strike me as extremely unfortunate if the fastest SA/LCP-Array construction algorithms can't be used with many efficient algorithms which utilize suffix arrays.

  1. Do the induced sorting construction methods strictly require that $ be lexicographically smallest, or do they work equally well (or with minor modifications) if I chose $ to be lexicographically largest?
  2. If the answer to 1 is no, do the algorithms Abouelhoda presents for emulating top-down/bottom-up suffix tree traversal apply if $ is lexicographically smallest, and if not can they be slightly modified so they can be used?
  3. If no to 1 and 2, are there completely different algorithms which may be used to perform similar tasks when I make the choice $ is lexicographically smallest? What are they, if they exist?
helloworld922
  • 10,801
  • 5
  • 48
  • 85

1 Answers1

0

If it ever matters, then you can just add another sentinel.

I'm pretty sure you can get induced sorting to work with a largest-value sentinel, but if you can't, or if you just don't want to bother figuring out how, then just add a largest-value sentinel before adding the smallest-value sentinel that the algorithm requires.

This would add just one extra suffix to the suffix array, which you could easily remove, and the remaining ones will be in the order you require.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87