Questions tagged [suffix-array]

A suffix array is a data structure that represents the lexicographically sorted list of all suffixes of a string (in the computer-science, not the linguistics, sense of the word suffix). It is the basis for many high-performance algorithms performed on very large strings, for example full-text search or compression.

A suffix array is a data structure that represents the lexicographically sorted list of all suffixes of a string. It is the basis for many high-performance algorithms performed on very large strings, for example full-text search or compression.

Formal definitions

String: A string is an ordered sequence of symbols, each taken from a pre-defined, finite set. That set is called alphabet, or character set. The symbols are often referred to as characters.

Suffix: Given a string T of length n, a suffix of T is defined as a substring that starts at any position of T and ends at position n (the end of T).

Example: Let T:=abc, then abc,bc and c are suffixes of T, but a and ab are not.

Remark: Any string T of length n has exactly n distinct suffixes (as many as there are characters in it), because any character is the beginning of exactly one suffix.

Suffix array: Given a string T of length n, and a linear ordering on the alphabet, the suffix array of T is the lexicographically sorted list of all suffixes of T.

Example: Let T:=abcabx and assume the 'natural' alphabetic ordering, i.e. a < b < c < d... < x < y < z. Then the suffix array of T is as follows.

abcabx
abx
bcabx
bx
cabx
x

Implementation

The suffix array is usually not explicitly stored in memory. Instead it is represented as a list of integers, each representing the starting position of a suffix.

abcabx 012345

Example: Given T as defined above, and assume a numbering of its positions from 0 to 5, the suffix array is represented as the list [0,3,1,4,2,5].

The suffix-array tag

Many of the questions tagged suffix-array are related to one of the topics below.

  • How to construct suffix arrays efficiently
  • How to store, and possibly compress, them efficiently
  • How to make use of them for various purposes, such as full-text search, detection of regularities in strings and text-compression
  • How they are used in various fields of application, in particular bioinformatics, genetics and natural language processing
  • What existing and/or ready-to-use implementations of any of the above are known
  • Worst-case, average-case and empirical comparisons of time and space requirements of existing algorithms and implementation
154 questions
1
vote
1 answer

Efficient All substring counting in sorted order

You are given a string find the frequency of all substring sorted(decreasing order) according to there frequency. Eg: ababa {"a", "b", "a", "b", "a", "ab", "ba", "ab", "ba", "aba", "bab", "aba", "abab", "baba",…
nil96
  • 313
  • 1
  • 3
  • 12
1
vote
1 answer

Generate array[0-k] for number of distinct sub-strings that are exactly 0 to k-repeated of a string using Suffix Array + LCP

I search over the internet: I found many solutions of k-repeated substrings usinf Suffix tree but not using Suffix array. Given string: abaababb Maximum number of repeated sub-strings ,k = length of string = 6 initially a[0..k]={0} Frequency of…
Nakshatra
  • 663
  • 1
  • 6
  • 14
1
vote
3 answers

Python Identifying Suffix within set of strings

doing an exercise on CheckIO and I'm wondering as to why this won't work. Given a set of strings, I'm trying to return True if any of the strings are suffixes of any other string in the set. False otherwise. Using itertools I'm generating the…
SpicyClubSauce
  • 4,076
  • 13
  • 37
  • 62
1
vote
1 answer

Find one occurence of substring using suffix array

I'm trying to figure out how to binary search in suffix array for one occurence of pattern. Let's have a text: petertomasjohnerrnoerror. I try to find er. SA is a suffix array of this text:…
Milano
  • 18,048
  • 37
  • 153
  • 353
1
vote
1 answer

Suffix Array Implementation Error

I keep getting compiler errors with an implementation of a suffix array by Arrays.sort. I get the following errors: a cannot be resolved to a variable Syntax error on token ",", . expected Syntax error on token "-", -- expected a cannot…
user3624831
  • 195
  • 1
  • 2
  • 9
1
vote
1 answer

building Suffix array in O(n logn)

I am reading suffix array construction tutorials from codechef and stackoverflow as well. One point I could understand is that they say.. It works by first sorting the 2-grams(*), then the 4-grams, then the 8-grams, and so forth, of the original…
sad
  • 820
  • 1
  • 9
  • 16
1
vote
1 answer

LCP array for Suffix Array

How to compute the LCP array for a suffix array? It doesn't have to be the most efficient. O(n log n) or O(n) will do. Something relatively easy to code if possible.
1
vote
1 answer

Implementing Longest Common Substring using Suffix Array

I am using this program for computing the suffix array and the Longest Common Prefix. I am required to calculate the longest common substring between two strings. For that, I concatenate strings, A#B and then use this algorithm. I have Suffix Array…
user3080029
  • 553
  • 1
  • 8
  • 19
1
vote
4 answers

Longest Common Substring

We have two strings a and b respectively. The length of a is greater than or equal to b. We have to find out the longest common substring. If there are multiple answers then we have to output the substring which comes earlier in b (earlier as in…
user3318603
  • 221
  • 3
  • 12
1
vote
1 answer

Why DC3 cannot be used as DC2 in suffix array?

I am reading the paper about DC3 to construct suffix array. I am wondering why DC3 cannot be applied as DC2, so that the calculation will be faster?
yun wang
  • 11
  • 2
1
vote
1 answer

sorting suffixes by qsort

I am trying to sort suffixes of a string by qsort() but not getting the sorted list . what should i do ? Here is what i have done : char str[MAXN]="banana", *a[MAXN]; for(i=0;i
Aseem Goyal
  • 2,683
  • 3
  • 31
  • 48
1
vote
1 answer

Modifying a Generalised Suffix Tree to hold number of times a node appears in the text string

How do I modify the procedure in Ukkonen's paper to hold a value for number of times a word appears in the text. Are there any such implementations available that provide the string frequency as well? The modification I want is like for a string…
Salena
  • 155
  • 1
  • 8
1
vote
2 answers

given a word forming a meaningful word by adding spaces in between them

You are given a string example "Iamastudent" without any spaces. You will be provided with a predefined dictionary function which verifies whether a given word is present in the dictionary or not. Using this function you have to insert the spaces in…
user1814884
1
vote
1 answer

longest common substring for 2/3 strings : suffix array vs dynamic programming approach

If I want to find the longest common substring for 2 strings then which approach will be more efficient in terms of time/space complexity: using suffix arrays of DP? DP will incur O(m*n) space with O(m*n) time complexity, what will be the time…
user1071840
  • 3,522
  • 9
  • 48
  • 74
1
vote
1 answer

How do you actually apply suffix arrays to any kind of text?

I am reading about Suffix Arrays and the code to build one is simple. But all the resources I have found usually use a trivial example text, which usually is banana, to explain the concept. So although the example text is trivial and the suffix…
Cratylus
  • 52,998
  • 69
  • 209
  • 339