0

Just checked out http://en.wikipedia.org/wiki/Suffix_array about suffix array.
and it says it require O(n long) space, and while the size of alphabet is sigma. The space require will be O(blog sigma) bits?

Can't get ideas for both of them..

here is what i know about a suffix array.
A suffix array is a integer array with n integer. So, it takes O(n)*8 bytes? as one integer we need 8 byte. And for the array itself, we need O(n) bytes? assume there are n characters.

jogojapan
  • 68,383
  • 11
  • 101
  • 131
Timothy Leung
  • 1,407
  • 7
  • 22
  • 39

1 Answers1

0

Indeed a suffix array – assuming no compression technique is used – is an array of integers. But an integer does not require exactly 8 bytes.

How many bits do you need to store an integer? The answer depends on the range of the integer. If the range is [0,2), i.e. the only numbers you are ever interested in representing are 0 and 1, then you need 1 bit to store that integer.

If your range is [0,4), i.e. you want to represent 0, 1, 2 and 3, then you need two bits: 00, 01, 10 and 11 are the four possible combinations of the two bits you can use to represent the four different numbers.

If the range is up to 8 numbers you need 3 bits, for up to 16 you need 4, etc. Generally speaking, for a range of R different numbers, you need ceil(log2(R)) bits.

How many bits do you need for the suffix array? I'll assume the length of the text is N characters. Then the length of the suffix array is also N, and each of its integers refers to a text position, i.e. the range of each integer is [0,N). Hence you need ceil(log2(N)) bits to store each integer, and since there are N integers in total, the total space requirement is N ceil(log2(N)) bits (not including space taken for the text itself).

(But note that much of the recent research on suffix arrays is about compressing them, i.e. finding ways to use only O(N) bits (this is called a succinct representation), or even less, i.e. o(N) bits (true compression). The simple calculation above applies only to the standard case where no compression techniques are used whatsoever.)

(Also note that in practice, many implementations will simply use an unsigned int or or something like this to represent the integer, and then you get N*sizeof(int)*CHAR_BIT for the size requirement in bits.)

jogojapan
  • 68,383
  • 11
  • 101
  • 131