Questions tagged [suffix-array]

A suffix array is a data structure that represents the lexicographically sorted list of all suffixes of a string (in the computer-science, not the linguistics, sense of the word suffix). It is the basis for many high-performance algorithms performed on very large strings, for example full-text search or compression.

A suffix array is a data structure that represents the lexicographically sorted list of all suffixes of a string. It is the basis for many high-performance algorithms performed on very large strings, for example full-text search or compression.

Formal definitions

String: A string is an ordered sequence of symbols, each taken from a pre-defined, finite set. That set is called alphabet, or character set. The symbols are often referred to as characters.

Suffix: Given a string T of length n, a suffix of T is defined as a substring that starts at any position of T and ends at position n (the end of T).

Example: Let T:=abc, then abc,bc and c are suffixes of T, but a and ab are not.

Remark: Any string T of length n has exactly n distinct suffixes (as many as there are characters in it), because any character is the beginning of exactly one suffix.

Suffix array: Given a string T of length n, and a linear ordering on the alphabet, the suffix array of T is the lexicographically sorted list of all suffixes of T.

Example: Let T:=abcabx and assume the 'natural' alphabetic ordering, i.e. a < b < c < d... < x < y < z. Then the suffix array of T is as follows.

abcabx
abx
bcabx
bx
cabx
x

Implementation

The suffix array is usually not explicitly stored in memory. Instead it is represented as a list of integers, each representing the starting position of a suffix.

abcabx 012345

Example: Given T as defined above, and assume a numbering of its positions from 0 to 5, the suffix array is represented as the list [0,3,1,4,2,5].

The suffix-array tag

Many of the questions tagged suffix-array are related to one of the topics below.

  • How to construct suffix arrays efficiently
  • How to store, and possibly compress, them efficiently
  • How to make use of them for various purposes, such as full-text search, detection of regularities in strings and text-compression
  • How they are used in various fields of application, in particular bioinformatics, genetics and natural language processing
  • What existing and/or ready-to-use implementations of any of the above are known
  • Worst-case, average-case and empirical comparisons of time and space requirements of existing algorithms and implementation
154 questions
0
votes
1 answer

How is the cost of suffix array generation O(n^2 log n)?

To build a suffis array on a string of n characters, we first generate the n suffixes O(n) and then sort them O(n log n) the total time complexity apprears to be O(n) + O(nlogn) = O(nlogn). But I am reading that it is O(n^2 log n) and could not…
Aadith Ramia
  • 10,005
  • 19
  • 67
  • 86
0
votes
1 answer

Suffix tree vs Suffix array for LCS

I'm working on a program to find the longest common substring between multiple strings. I've lowered my approach down to either using suffix array's or a suffix tree. I want to see which is the better approach (if there is one) and why. Also for…
zeus_masta_funk
  • 1,388
  • 2
  • 11
  • 34
0
votes
1 answer

cocos2d-iphone. Spritesheet depending on a screen resolution?

cocos2d adds suffixes to resources by the similar way as "@2x" works for usual iOS apps. I also want to place these pictures into a spritesheet. The problem is a default cocos2d spritesheet is represented as one png and one plist file with sprite…
Gargo
  • 1,135
  • 1
  • 10
  • 21
0
votes
1 answer

How much space is suffix array using?

Just checked out http://en.wikipedia.org/wiki/Suffix_array about suffix array. and it says it require O(n long) space, and while the size of alphabet is sigma. The space require will be O(blog sigma) bits? Can't get ideas for both of them.. here…
Timothy Leung
  • 1,407
  • 7
  • 22
  • 39
0
votes
1 answer

Meaning of S[3i]S[3i + 1]S[3i + 2]

I have trouble understanding the following: We have the String ABRACADABRA. We divide this into groups as example: S is divided into the group: S0 = where <> signifies an array and S[i] signifies the…
Cratylus
  • 52,998
  • 69
  • 209
  • 339
0
votes
2 answers

Suffix array and search a substring in a string

I found an implementation of suffix array in Ruby and changed it a bit. Here is what I have: class SuffixArray def initialize(str) @string = str @suffix_array = [] (0...str.length).each do |i| substring =…
Alan Coromano
  • 24,958
  • 53
  • 135
  • 205
0
votes
1 answer

Minimum rotation using Suffix array-- Revisited

Consider a string of length n (1 <= n <= 100000). Determine its minimum lexicographic rotation. For example, the rotations of the string “alabala” are: alabala labalaa abalaal balaala alaalab laalaba aalabal and the smallest among…
username_4567
  • 4,737
  • 12
  • 56
  • 92
0
votes
1 answer

Unable to understand the concept mentioned in http://pine.cs.yale.edu/pinewiki/SuffixArrays

Please explain: Suppose we have a suffix array corresponding to an n-character text and we want to find all occurrences in the text of an m-character pattern. Since the suffixes are ordered, the easiest solution is to do binary search for the first…
Shashank Jain
  • 469
  • 1
  • 5
  • 11
0
votes
5 answers

substring calculation in a string

I am having difficulty finding better approach than O(n^2) for the following question. I am given a string e.g xyxxz. Now I need to find total number of matching characters in each prefix of the given string. Here, possible prefixes of string are: …
vijay
  • 2,034
  • 3
  • 19
  • 38
0
votes
2 answers

Maximal substrings search

Given a string S, consisting of the lowercase Latin letters. I want to find for each position S[i] max length L[i] for which there exists a position i' < i that s[i'..i'+L[i]-1] = s[i..i+L[i]-1]. For example: s = ababaab, L= {0,0,3,2,1,2,1}. I want…
rodart
  • 77
  • 4
-1
votes
1 answer

How to implement suffix array in C without using qsort?

I searched for the implementation of suffix array in C, but all the programs I saw were in C++ which used sort. I am not sure how can I use the built-in function of C, qsort() in place of sort() function of C. Can we implement suffix arrays without…
user13233820
-1
votes
1 answer

I have created a suffix array but i don't know what is wrong with this code

import java.util.Arrays; import java.util.Scanner; public class SuffixArray { public static class Tuple implements Comparable{ public Integer originalIndex; public Integer firstHalf; public Integer secondHalf; @Override …
-1
votes
3 answers

Binary search to find longest common prefix

For an school assignment, we are implementing suffixarray, with the methods of building it and finding the longest common prefix. I manage to build and sort the suffix array quite easily but struggle with the LCP. I am trying to find the longest…
Isus
  • 143
  • 2
  • 16
-1
votes
1 answer

Suffix array construction

I am learning suffix array construction from this link. Here is the code that I have ported from c++ to java class Entry implements Comparable { int [] nr = new int[2]; int p=0; public int compareTo(Entry that){ if…
sam manual
  • 127
  • 6
-1
votes
3 answers

Check whether s1 has the prefix/suffix AAA and assign the result to a boolean variable b

Hello here is the question. Check whether s1 has the prefix AAA and assign the result to a boolean variable b Check whether s1 has the prefix AAA and assign the result to a boolean variable b This is what I have so far /** * * @author…
Sam777
  • 19
  • 1
  • 2
  • 7
1 2 3
10
11