8

I have written a java program for compression. I have compressed some text file. The file size after compression reduced. But when I tried to compress PDF file. I dinot see any change in file size after compression.

So I want to know what other files will not reduce its size after compression.

Thanks Sunil Kumar Sahoo

16 Answers16

12

File compression works by removing redundancy. Therefore, files that contain little redundancy compress badly or not at all.

The kind of files with no redundancy that you're most likely to encounter is files that have already been compressed. In the case of PDF, that would specifically be PDFs that consist mainly of images which are themselves in a compressed image format like JPEG.

Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
6

jpeg/gif/avi/mpeg/mp3 and already compressed files wont change much after compression. You may see a small decrease in filesize.

waqasahmed
  • 3,555
  • 6
  • 32
  • 52
5

Compressed files will not reduce their size after compression.

stefanw
  • 10,456
  • 3
  • 36
  • 34
4

Five years later, I have at least some real statistics to show of this.

I've generated 17439 multi-page pdf-files with PrinceXML that totals 4858 Mb. A zip -r archive pdf_folder gives me an archive.zip that is 4542 Mb. That's 93.5% of the original size, so not worth it to save space.

Claes Mogren
  • 2,126
  • 1
  • 26
  • 34
3

The only files that cannot be compressed are random ones - truly random bits, or as approximated by the output of a compressor.

However, for any algorithm in general, there are many files that cannot be compressed by it but can be compressed well by another algorithm.

Will
  • 73,905
  • 40
  • 169
  • 246
2

PDF files are already compressed. They use the following compression algorithms:

  • LZW (Lempel-Ziv-Welch)
  • FLATE (ZIP, in PDF 1.2)
  • JPEG and JPEG2000 (PDF version 1.5 CCITT (the facsimile standard, Group 3 or 4)
  • JBIG2 compression (PDF version 1.4) RLE (Run Length Encoding)

Depending on which tool created the PDF and version, different types of encryption are used. You can compress it further using a more efficient algorithm, loose some quality by converting images to low quality jpegs.

There is a great link on this here

http://www.verypdf.com/pdfinfoeditor/compression.htm

badbod99
  • 7,429
  • 2
  • 32
  • 31
  • Not really. Not all PDF file automatically stores their content in compressed format. But you're right, PDF supports compression. Unless your PDF contains only images, there's a high probability that you could squeeze some extra space using ZIP or RAR – Salamander2007 Jul 16 '09 at 10:13
  • It 100% depends on the application which created the PDF, as mentioned in my post. – badbod99 Jul 21 '09 at 14:08
2

Files encrypted with a good algorithm like IDEA or DES in CBC mode don't compress anymore regardless of their original content. That's why encryption programs first compress and only then run the encryption.

sharptooth
  • 167,383
  • 100
  • 513
  • 979
1

Generally you cannot compress data that has already been compressed. You might even end up with a compressed size that is larger than the input.

Martin Liversage
  • 104,481
  • 22
  • 209
  • 256
1

You will probably have difficulty compressing encrypted files too as they are essentially random and will (typically) have few repeating blocks.

Colin Desmond
  • 4,824
  • 4
  • 46
  • 67
0

Media files don't tend to compress well. JPEG and MPEG don't compress while you may be able to compress .png files

AutomatedTester
  • 22,188
  • 7
  • 49
  • 62
  • Actually JPEG and MPEG files can often be compressed a few percent by a good compression algorithm. – Michael Borgwardt Jul 16 '09 at 08:58
  • Are you sure? Remember that special-purpose compression algorithms often lose some data not important for the content (like noise in sound files or similar areas on images). That means they always have better compression ratio than any general purpose compression algorithms (mainy loss-less). – twk Jul 16 '09 at 09:05
  • But BMP files compress very well. This does not depend from type of a media, but from a compression type. And yes - file formats are some kind of a compression of information. – smok1 Jul 16 '09 at 09:18
0

File that are already compressed usually can't be compressed any further. For example mp3, jpg, flac, and so on. You could even get files that are bigger because of the re-compressed file header.

Federico klez Culloca
  • 26,308
  • 17
  • 56
  • 95
0

Really, it all depends on the algorithm that is used. An algorithm that is specifically tailored to use the frequency of letters found in common English words will do fairly poorly when the input file does not match that assumption.

In general, PDFs contain images and such that are already compressed, so it will not compress much further. Your algorithm is probably only able to eke out meagre if any savings based on the text strings contained in the PDF?

Coxy
  • 8,844
  • 4
  • 39
  • 62
0

Simple answer: compressed files (or we could reduce file sizes to 0 by compressing multiple times :). Many file formats already apply compression and you might find that the file size shrinks by less then 1% when compressing movies, mp3s, jpegs, etc.

soulmerge
  • 73,842
  • 19
  • 118
  • 155
0

You can add all Office 2007 file formats to the list (of @waqasahmed):

Since the Office 2007 .docx and .xlsx (etc) are actually zipped .xml files, you also might not see a lot of size reduction in them either.

GvS
  • 52,015
  • 16
  • 101
  • 139
  • i created an excel sheet with python xlsxwriter. When i resaved it with libreoffice calc the size decreased by more than 60 percent. why is it ? – bakarin Sep 12 '18 at 06:32
0
  1. Truly random

  2. Approximation thereof, made by cryptographically strong hash function or cipher, e.g.:

    AES-CBC(any input)

    "".join(map(b2a_hex, [md5(str(i)) for i in range(...)]))

Dima Tisnek
  • 11,241
  • 4
  • 68
  • 120
0

Any lossless compression algorithm, provided it makes some inputs smaller (as the name compression suggests), will also make some other inputs larger.

Otherwise, the set of all input sequences up to a given length L could be mapped to the (much) smaller set of all sequences of length less than L, and do so without collisions (because the compression must be lossless and reversible), which possibility the pigeonhole principle excludes.

So, there are infinite files which do NOT reduce its size after compression and, moreover, it's not required for a file to be an high entropy file :)

Gianluca Ghettini
  • 11,129
  • 19
  • 93
  • 159