1

I am having difficulties in grasping the decode algorithm for the Burrows Wheeler transform (BWT.) I've done reading online and went through some sample code, but, they all seem to be using a 'primary index' to decode an encoded string.

My question is, how can we decode a BWT encoded string like 'rdacraaaabb' to its original 'abracadabra'.

Some sample code would be wonderful.

phs
  • 10,687
  • 4
  • 58
  • 84
DeepHouse
  • 21
  • 1
  • 3

2 Answers2

1

You want to look at http://www.phpclasses.org/package/3559-PHP-Compress-and-decompress-data-using-BWT-and-MTF.html.

Micromega
  • 12,486
  • 7
  • 35
  • 72
  • It's a great link! Only issue is, the program can only decode the data it encoded itself. It cannot decode generic BWT data. – DeepHouse May 07 '11 at 15:40
  • Chunk size doesn't matter. The issue is while encoding the BWT, the program puts a special EOF character and it relies on that to decode it. I was wondering if there was a way to decode it if we don't have a EOF character. – DeepHouse May 08 '11 at 04:25
  • I tried voting. I'll accept it but it's not the best answer. :( – DeepHouse May 08 '11 at 08:55
  • Working on it now. Will post results if I have any success. – DeepHouse May 08 '11 at 12:05
  • @DeepHouse The EOF character is required to be able to decode the BWT encoded word, I think. (Why would it otherwise *be* there, since it is not at the end in the encoded string)? – zrajm Sep 08 '13 at 21:19
0

The inverse part is the easiest part of the algorithm: create cumulative histograms and retrieve values based on their rank.

You can find a complete block compressor/decompressor based on the BWT here: http://code.google.com/p/kanzi/source/browse/java/src/kanzi/transform/BWT.java

flanglet
  • 564
  • 4
  • 11