0

I'm looking into the specific O(NlogN) implementation of suffix array found at this link : https://sites.google.com/site/indy256/algo/suffix_array
I'm able to understand the core concepts but understanding the implementation in its entirety is a problem.

public static int[] suffixArray(CharSequence S) {
 int n = S.length();
 Integer[] order = new Integer[n];
 for (int i = 0; i < n; i++)
  order[i] = n - 1 - i;

// stable sort of characters
Arrays.sort(order, (a, b) -> Character.compare(S.charAt(a), S.charAt(b)));

int[] sa = new int[n];
int[] classes = new int[n];
for (int i = 0; i < n; i++) {
  sa[i] = order[i];
  classes[i] = S.charAt(i);
}
// sa[i] - suffix on i'th position after sorting by first len characters
// classes[i] - equivalence class of the i'th suffix after sorting by first len characters

for (int len = 1; len < n; len *= 2) {
  int[] c = classes.clone();
  for (int i = 0; i < n; i++) {
    // condition sa[i - 1] + len < n simulates 0-symbol at the end of the string
    // a separate class is created for each suffix followed by simulated 0-symbol
    classes[sa[i]] = i > 0 && c[sa[i - 1]] == c[sa[i]] && sa[i - 1] + len < n && c[sa[i - 1] + len / 2] == c[sa[i] + len / 2] ? classes[sa[i - 1]] : i;
  }
  // Suffixes are already sorted by first len characters
  // Now sort suffixes by first len * 2 characters
  int[] cnt = new int[n];
  for (int i = 0; i < n; i++)
    cnt[i] = i;
  int[] s = sa.clone();
  for (int i = 0; i < n; i++) {
    // s[i] - order of suffixes sorted by first len characters
    // (s[i] - len) - order of suffixes sorted only by second len characters
    int s1 = s[i] - len;
    // sort only suffixes of length > len, others are already sorted
    if (s1 >= 0)
      sa[cnt[classes[s1]]++] = s1;
  }
}
return sa;
}

I'm wondering about the use of cnt[] array and places it is useful. Any pointers would be helpful.

Thanks.

Gokul M
  • 51
  • 5
  • Good setup to the question, but the question itself is way too vague. What exactly are you wondering about it? What kinds of pointers? In other words, can you tell us more specifically what it is you want to know? – Erick G. Hagstrom Aug 25 '15 at 17:29
  • @ErickG.Hagstrom I'm given to understand that classes[] holds sort-index. eg for {ab,ab,ac,de,de} would have {0,0,2,3,3} and sa[] would hold suffix-array sorted by first len characters. Just cant figure out how cnt[] helps in sorting the next len characters. Worked out with paper and pen , doesnt make much sense. – Gokul M Aug 25 '15 at 18:30

0 Answers0