0

I'm currently studying different compression algorithms such as huffman, adaptive huffman and Lempel Ziv algorithms but I don't really understand how it should work on a random file.

So I know that they work on text file, but is it the only thing they work on? Can I use Huffman to compress an audio file or an image and if so, how do I know the size of the "blocks" I'll be using for the algorithm ?

  • 1
    Since this question is so vague it's a better for [Quora](https://www.quora.com). Stack Overflow focuses on questions that pertain to actual code. – tadman Feb 28 '18 at 21:32

2 Answers2

3

Huffman and adaptive Huffman are examples of coding, which takes advantage of a statistical skew in the probabilities of the symbols to code them into as few bits as possible. (There are other types of coding, such as arithmetic, range, and asymmetric numeral systems.)

Lempel-Ziv is an example of modeling, which takes redundancy found in the particular kind of data being compressed, in this case text, and converts it into a series of symbols suitable for coding. Lempel-Ziv works on the assumption that strings of various lengths will oft be repeated in the text, which is the case for natural languages.

That assumption doesn't work at all for audio or image files, where the redundancy takes very different forms. There transforms are performed on the data to separate out components by frequency as part of the modeling. Also lossy compression is acceptable for both audio and image data to be consumed by humans, where data can be decimated or discarded depending on where it falls in the frequency domain, as well as using other ways to take advantage of psycho-acoustic or psycho-visual effective redundancy.

Once that sort of modeling is done, then similar coding can be applied to code the resulting symbols into a minimally sized stream of bits.

Compression consists of modeling, which is highly dependent on the type of data to be compressed, as well as the consumer of the data in the case of lossy compression, followed by coding, which compresses the resulting information into a compressed bit stream.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
0

Yes, the algorithms that you mention there all work equally well on binary files - it's just for convenience that most papers use character data in their examples.

As for block size, though this is not a requirement, modern general purpose compression algorithms invariably treat the input as a stream of bytes (8-bit values).

Note that while you can in principle attempt to compress an audio file with Huffman compression, the outcome may be unrewarding because Huffman relies on certain symbols being more frequent than others. Special purpose compression algorithms, like MPEGx, are typically used for audio.