0

I tried to compress zip file using a LZW compression method(code provided in following link),

http://rosettacode.org/wiki/LZW_compression#C

It creates encoded file length as too long than original file size, what is the reason for that? please anybody help me to understand the what is happening in real time.

  • 2
    What, you try to compress an already compressed file? That will most likely create a file bigger than the original, as it can't be compressed more, plus the compressor adds metadata to the output file. – Some programmer dude Oct 21 '13 at 10:59
  • which method is preferred to compress any binary data stream? – user2902744 Oct 21 '13 at 11:13
  • Binary is irrelevant. The key problem is that the ZIP compression removed (most of) the redundant data in the original file, so there is no more to compress. – Eric Postpischil Oct 21 '13 at 11:15
  • If you want to use your own compression scheme (for whatever reason), first *decompress* the original data, then *recompress* it with your own. But note that any existing LZW library will almost certainly out-compress your own code (with the correct parameters). – Jongware Oct 21 '13 at 15:34

1 Answers1

1

It is impossible for a lossless compression to compress every file to a shorter file.

This is because there are 256N files that are N bytes long, but there are (256N-1)/255 files that are shorter than N bytes. So not every file can be mapped to shorter files.

More than that, if any file becomes shorter, then some shorter file had to give up its spot to make that possible. So some files must become larger.

Lossless compression works by recognizing common patterns in typical files created by humans and converting long high-probability sequences of bytes to shorter sequences. The price for this is that some sequences become longer. The goal of the design is to make typical files compress, but atypical files must get longer.

If a compression does its job, redundant information is removed from a file, and the output is similar to random data. Then the output cannot be compressed further.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312