-5

Can anyone show me how to build a tree for LZW compression in c? Is it something like struct tree{ short next[255]; }

LSY
  • 11
  • 2

1 Answers1

1

LZW's tree is more of a dictionary. Each entry consists of a code (index) and a character. The tree is logically initialized with all the characters, assuming 8 bit characters, these use the codes 0x00 to 0xff. These may not be actually stored in the dictionary, just emulated as if they were.

Say each dictionary entry consists of (code | char), and you have an input string "abcd", then dictionary[100] = ('a' | 'b'), dictionary[101] = (100 | 'c'), dictionary[102] = (101 | 'd').

Note that the decoder has to use something like a stack to hold a string, since it gets the characters in reverse order. For example with the code 102, it would retrieve 'd' from [102], then 'c' from [101], then 'b' and 'a' from [100] in that order. The end (really the beginning) of a string is indicated when code < 0x100.

There's also a special case where the decoder receives a code that will be the next code to put in the dictionary, but it's not there yet. This is handled by dictionary[next code] = (previous code | last character of previous code). The previous code and last character of each decoded string has to be saved to handle this case.

There are usually control codes, say there are 8 of them, then the compressor adds 8 to each non-control code, and the decompressor subtracts 8 from each non-control code.

The compressed stream may consist of codes stored in big endian or little endian format. For big endian format, each byte from a compressed stream goes into the low order byte of a working "register", which is shifted left before pickup up a new byte. For little endian format, each byte from a compressed stream goes into the high order portion of a working "register", and the "register" is shifted right after picking up a new byte.

Both the encoder and decoder need some method to search the dictionary for a match to (code | char). Some type of hash function can help here. Hardware implementations will use a content addressable (associative) memory.

Do a web search to see if you can find actual code examples.

LZW is a derivative of LZ78. Note that LZ77 and it's derivatives are simpler to implement and in the case of X86 program files, LZ77 "moving window" compression algorithms do a better job. Wiki link LZ77 LZ78 .

rcgldr
  • 27,407
  • 3
  • 36
  • 61