Questions tagged [suffix-array]

A suffix array is a data structure that represents the lexicographically sorted list of all suffixes of a string (in the computer-science, not the linguistics, sense of the word suffix). It is the basis for many high-performance algorithms performed on very large strings, for example full-text search or compression.

A suffix array is a data structure that represents the lexicographically sorted list of all suffixes of a string. It is the basis for many high-performance algorithms performed on very large strings, for example full-text search or compression.

Formal definitions

String: A string is an ordered sequence of symbols, each taken from a pre-defined, finite set. That set is called alphabet, or character set. The symbols are often referred to as characters.

Suffix: Given a string T of length n, a suffix of T is defined as a substring that starts at any position of T and ends at position n (the end of T).

Example: Let T:=abc, then abc,bc and c are suffixes of T, but a and ab are not.

Remark: Any string T of length n has exactly n distinct suffixes (as many as there are characters in it), because any character is the beginning of exactly one suffix.

Suffix array: Given a string T of length n, and a linear ordering on the alphabet, the suffix array of T is the lexicographically sorted list of all suffixes of T.

Example: Let T:=abcabx and assume the 'natural' alphabetic ordering, i.e. a < b < c < d... < x < y < z. Then the suffix array of T is as follows.

abcabx
abx
bcabx
bx
cabx
x

Implementation

The suffix array is usually not explicitly stored in memory. Instead it is represented as a list of integers, each representing the starting position of a suffix.

abcabx 012345

Example: Given T as defined above, and assume a numbering of its positions from 0 to 5, the suffix array is represented as the list [0,3,1,4,2,5].

The suffix-array tag

Many of the questions tagged suffix-array are related to one of the topics below.

  • How to construct suffix arrays efficiently
  • How to store, and possibly compress, them efficiently
  • How to make use of them for various purposes, such as full-text search, detection of regularities in strings and text-compression
  • How they are used in various fields of application, in particular bioinformatics, genetics and natural language processing
  • What existing and/or ready-to-use implementations of any of the above are known
  • Worst-case, average-case and empirical comparisons of time and space requirements of existing algorithms and implementation
154 questions
-1
votes
1 answer

Suffixing the each member of a Java Array of String

Suffixing of each array member by the same suffix (.wav) is required as: String [] a = {"one", "two", "three"}; String str = ".wav" Required output : String[] a ={"one.wav", "two.wav", "three.wav"}; I tried to achieve this in the following…
Harjit Singh
  • 905
  • 1
  • 8
  • 17
-2
votes
2 answers

Finding all the shortest unique substring which are of same length?

Given a string sequence which contains only four letters, ['a','g','c','t'] for example: agggcttttaaaatttaatttgggccc. Find all the shortest unique sub-string of the string sequence which are of equal length (the length should be minimum of all the…
vinay_raj
  • 1
  • 1
  • 2
-2
votes
1 answer

Frequency of precipitation above 95th percentile

so I have gridded precipitation data with dimenstion (324,72,144) being time, lon, lat respectively. I want to count count the frequeny of this data that is above the 95th percentile but I am really confused since this data is gridded. I would be…
-3
votes
1 answer

How to implement an algorithm to solve this following homework in O(n) time by suffix tree?

I have a question in bioinformatics. You can solve it by suffix tree structure. Given a string S=S[1…n] and a number k, we want to find the smallest sub-string of S that occurs in S exactly k times, if it exists. How to solve this problem in O(n)…
1 2 3
10
11