7

I generated some high resolution publication quality plots for example

library(plot3D)
Volcano<-volcano
zf=10 #zoom factor
tiff("Volcano.tif", width=1800*zf, height=900*zf, res=175*zf, compression="lzw")
image2D(z = Volcano, clab = "height, m",colkey = list(dist = -0.20, shift = 0.15,side = 3, length = 0.5, width = 0.5,cex.clab = 1.2, col.clab = "white", line.clab = 2,col.axis = "white", col.ticks = "white", cex.axis = 0.8))
dev.off()

the file is 22 MB.

Now I open the file with GIMP and without doing anything else I export it as "Volcano gimp.tif" (don't change resolution, or do anything else). GIMP generates a file ("Volcano gimp.tif") that is 1.9 MB.

imagemagick reports similar image stats:

$ identify Volcano.tif
Volcano.tif TIFF 18000x9000 18000x9000+0+0 8-bit DirectClass 22.37MB 0.000u 0:00.000
$ identify "Volcano gimp.tif"
Volcano gimp.tif TIFF 18000x9000 18000x9000+0+0 8-bit DirectClass 1.89MB 0.000u 0:00.000

even using identify -verbose the 2 files appear to be similar.

What is the difference between these files? Why do they have so different file sizes?

UPDATE: OK, things are getting crazier. I did the same thing with IrfanView and I get different file sizes. The initial file is the Volcano.tif generated from R with compression="lzw". Check how Volcano irfan.tif and Volcano gimp.tif differ in size but all other stats are the same. Memory footprint, DPI, Colors, Resolution is identical. Disk size is different.

enter image description here

UPDATE 2: Adobe Photoshop saves the file down to 2.6 MB

enter image description here

WinRar reports that the original R generated TIFF is highly compressible (from 22MB ->3.6MB)

UPDATE 3: This issue might be similar to Montage / Join 2 TIFF images in a 2 col x 1 row tile without losing quality

UPDATE 4: The R generated TIFF file can be found here http://ge.tt/7ZvRd4C1/v/0?c

Community
  • 1
  • 1
ECII
  • 10,297
  • 18
  • 80
  • 121
  • 1
    There seems to be something amiss with the `tiff` function. On my Win7 machine, (a slightly out of date v2.15.2) R won't create a valid image file at all using compression `rle`, `jpeg` or `zip`. Will investigate further on a different machine later. In the mean time, try playing around with `tiff` options and see if you can replicate my odd behaviour. There could be a bug buried here. – Richie Cotton Jan 02 '14 at 15:28
  • 1
    `compression="zip"` crushes my session! – ECII Jan 02 '14 at 15:34
  • Using LZW with and without the predictor option on 24-bpp data can make a huge difference in the compression ratio (like you are observing). Post the TIFF's somewhere I can download them and I will tell you why they are different sizes. – BitBank Jan 02 '14 at 18:13
  • Here is the R generated TIFF file http://ge.tt/7ZvRd4C1/v/0?c – ECII Jan 02 '14 at 18:44
  • 2
    The R generated TIFF file is not using the TIFF predictor. This causes the terrible compression when working with 24-bpp data since the LZW compression works 8-bits at a time. The predictor allows for the constant color sections to "cancel each other out", become black and compress much better. – BitBank Jan 02 '14 at 18:58
  • 1
    OK, thanks for the info. What does this mean practically? Is the problem solely on the compression? Should I output as uncompressed and then compress with GIMP? Also please make this an answer rather a comment (would be helpful to include some more details, I am considering filing this as a bug). – ECII Jan 02 '14 at 19:00
  • 2
    In the future, you can use my TIFFTOOL to see all the details of why those files were different: http://bitbanksoftware.com/tinytools.html – BitBank Jan 02 '14 at 20:08
  • 1
    I just published an OSX version of my TIFFTOOL for those of you who don't use Windows: https://itunes.apple.com/us/app/tifftool/id955437526?mt=12 – BitBank Feb 05 '15 at 13:17
  • The issue seems to be resolved when using compression="lzw+p" – ECII Aug 07 '18 at 06:22

1 Answers1

10

Apparently the TIFF LZW compressor used by R is not making use of an important option (the TIFF predictor) which is leading to an extremely large file. Data compression works best when it can recognize symmetries/redundancies in the data. In this case, the image data is composed of 24-bit (3-byte) pixels containing red, green and blue 8-bit values. Standard LZW compression looks at a stream of bytes for repeating patterns. If it looks at the color image simply as a stream of bytes, it will see repeating patterns of 3-bytes instead of repeating patterns of constant color. Enabling the TIFF predictor on the data causes a differencing filter to store the delta of each pixel with its neighbor. If the neighboring pixels are the same color, it will store 0's. A long string of 0's compresses much better than repeating patterns of non-zeros which are at least 3 bytes long.

Here is an example of how it works on a 6 pixel line. When encoding, the predictor starts from the right edge and works left for each scan line:

Original data:
2A 50 40 2A 50 40 2A 50 40 2A 50 40 2A 50 40 2A 50 40 (6 pixels of the same color)

After horizontal differencing (TIFF predictor):
2A 50 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

The data is much more compressible after the predictor since long runs of the same value (0x00) are easier for LZW to compress.

Conclusion: This should be filed as a bug against the owner of the R compression code since using LZW on full color images without the predictor produces poor results. In the mean time, a workaround is needed to compress it more efficiently.

BitBank
  • 8,500
  • 3
  • 28
  • 46
  • Excellent. Thank you. I filed a bug https://bugs.r-project.org/bugzilla/show_bug.cgi?id=15626 . What should I do in the meantime? Should I save uncompressed TIFFs and the compress them with GIMP or ImageMagick or save the plots as PNG and then convert them to TIFF? – ECII Jan 02 '14 at 19:32
  • 2
    PNG should get you the smallest file since it takes advantage of both horizontal and vertical symmetries. Uncompressed TIFFs would take up huge amounts of disk space, so even the poorly compressed ones would be a better choice. The choice of final file format depends on what software will be opening them. They're all using lossless compression so the original data is preserved. – BitBank Jan 02 '14 at 19:36
  • What happens when I take the poorly compressed TIFF generated from R and open it and save it with GIMP. Does the LZW compression work properly? Is this lossless? Also is PNG->TIFF lossless? (My publisher requires TIFF) – ECII Jan 02 '14 at 19:40
  • 2
    PNG and TIFF LZW are lossless (with or without the predictor). All of the file conversions you plan to use will result in identical output, so the only difference will be the file size. – BitBank Jan 02 '14 at 19:42
  • Thanks for the very knowledgeable answer! – Ben Bolker Jan 02 '14 at 20:20