2

I'm using iText version 5.5.6 (tested also 5.3.4) with Java 7 (1.7.0_71) 64bit on Windows 7

Here's a the example code

@Test
public void testConvert() throws Exception {
        try{
            //Read the Tiff File
            RandomAccessFileOrArray myTiffFile=new RandomAccessFileOrArray("C:\\local\\docs\\test.01.tif");
            //Find number of images in Tiff file
            int numberOfPages= TiffImage.getNumberOfPages(myTiffFile);
            System.out.println("Number of Images in Tiff File: " + numberOfPages);
            Document TifftoPDF=new Document();
            PdfWriter.getInstance(TifftoPDF, new FileOutputStream("C:\\local\\docs\\test.01.pdf"));
            TifftoPDF.open();
            //Run a for loop to extract images from Tiff file
            //into a Image object and add to PDF recursively
            for(int i=1;i<=numberOfPages;i++){
                //*******           
                //******* this next line is generating the error
                //*******
                Image tempImage=TiffImage.getTiffImage(myTiffFile, i);
                TifftoPDF.add(tempImage);
            }
            TifftoPDF.close();
            System.out.println("Tiff to PDF Conversion in Java Completed" );
        }
        catch (Exception i1){
            i1.printStackTrace();
        }
}

generates the following error

java.lang.ClassCastException
    at com.itextpdf.text.pdf.codec.TIFFField.getAsInt(TIFFField.java:315)
    at com.itextpdf.text.pdf.codec.TiffImage.getTiffImage(TiffImage.java:163)
    at com.itextpdf.text.pdf.codec.TiffImage.getTiffImage(TiffImage.java:315)
    at com.itextpdf.text.pdf.codec.TiffImage.getTiffImage(TiffImage.java:303)
    at com.pdf.ImageConverterImplIT.testConvert(ImageConverterImplIT.java:116)
Timm-ah
  • 1,276
  • 2
  • 10
  • 9
  • The fact that you can open a TIFF in an image viewer doesn't always mean that such a TIFF is valid. If you want to ignore errors in the TIFF (which is what many image viewers do), please read the answer to the question [Exception when converting tiff file to pdf file with iText](http://stackoverflow.com/questions/29787388/exception-when-converting-tiff-file-to-pdf-file-with-itext) – Bruno Lowagie May 11 '15 at 15:13
  • attempted the following, replacing the problem line, with the same results:`Image tempImage=TiffImage.getTiffImage(myTiffFile, i, true);` and `Image tempImage=Image.getInstance("C:\\local\\docs\\test.01.tif",true);` – Timm-ah May 11 '15 at 15:24
  • Then there's something else wrong with that specific TIFF. It works with other TIFFs, doesn't it? If you want this to be fixed, you'll have to share the "bad" TIFF for inspection. – Bruno Lowagie May 12 '15 at 06:16
  • Bruno...thanks for the assist!!! Here's a link to one of my [bad files](https://db.tt/xwLNta8H) – Timm-ah May 12 '15 at 09:54

2 Answers2

6

I'm going to go into a deep dive of hex surgery on your file, the cause of the exception in iText and ultimately the cause of this bug. I will then go on a screed describing why this happens.

Your file is structured so that the primary IFD is at the end of the file. Here is the file header:

49 49 2A 00 96 6C 00 00 
intel magic offset-----

Which says "I'm a TIFF in Intel (little endian) byte order and my primary IFD starts at offset 0x6c9c.

If you skip ahead to this spot you see this:

0F 00 <- this is the total number of tags, each tag is 12 bytes

#  |  ID |Type | Count     | Value     |
01. 00 01 04 00 01 00 00 00 A2 06 00 00 width = 6a2
02. 01 01 04 00 01 00 00 00 4A 04 00 00 height = 44a
03. 02 01 03 00 01 00 00 00 01 00 00 00 bits per sample = 1
04. 03 01 03 00 01 00 00 00 04 00 00 00 Compression = CCITT G4
05. 06 01 03 00 01 00 00 00 00 00 00 00 Photometric = min is white
06. 0A 01 04 00 01 00 00 00 01 00 00 00 Fill order = msb to lsb
07. 11 01 04 00 01 00 00 00 08 00 00 00 Offset of strips = 8
08. 15 01 03 00 01 00 00 00 01 00 00 00 Samples per pixel = 4
09. 16 01 04 00 01 00 00 00 4A 04 00 00 Rows per strip = 448
0a. 17 01 04 00 01 00 00 00 5B 6C 00 00 Strip byte counts = 6c5b
0b. 1A 01 05 00 01 00 00 00 63 6C 00 00 Offset to x resolution = 6c63
0c. 1B 01 05 00 01 00 00 00 6B 6C 00 00 Offset to y resolution = 6c6b
0d. 1C 01 03 00 01 00 00 00 01 00 00 00 Planar Config = Contiguous
0e. 28 01 03 00 01 00 00 00 02 00 00 00 Resolution unit = inches
0f. 31 01 02 00 23 00 00 00 73 6C 00 00 Software string offset = 6c73
Location of next IFD, 0 means no more
00 00 00 00 

Now, looking at the call stack and tracing it back to source, I see a call being made to get the fill order. Fill order for 1 bit files describes whether the high order bit or the low order bit in a byte is leftmost in display.

TIFFField fillOrderField =  dir.getField(TIFFConstants.TIFFTAG_FILLORDER);
if (fillOrderField != null)
    fillOrder = fillOrderField.getAsInt(0);

We know that this will get called since there is a fill order tag in your IFD, which is a 4 byte integer with value 1.

Unfortunately for you, that call to TIFFFIELD.getAsInt(0) is causing a failure.

If you look at that code:

public int getAsInt(int index) {
    switch (type) {
    case TIFF_BYTE: case TIFF_UNDEFINED:
        return ((byte[])data)[index] & 0xff;
    case TIFF_SBYTE:
        return ((byte[])data)[index];
    case TIFF_SHORT:
        return ((char[])data)[index] & 0xffff;
    case TIFF_SSHORT:
        return ((short[])data)[index];
    case TIFF_SLONG:
        return ((int[])data)[index];
    default:
        throw new ClassCastException();
    }
}

You can see that it can throw a ClassCastException if type doesn't match, and in this case it will since those type constants in the cases are 1, 7, 6, 3, 8 and 9 respectively and the tag's type is 4.

So why is the code wrong?

The problem with TIFF tags is that even though the spec is pretty clear about the fact that the FillOrder tag (10a) should be an unsigned short (type 3), the tag in your file is an unsigned 4 byte int (type 4), but the switch statement there doesn't account for that (no case for TIFF_LONG).

Why is there no case for this? Looking at the surrounding code, this library treats 4 byte unsigned integers as the java type 'long' and trying to treat a 4 byte unsigned int as 4 byte signed int could cause an overflow into the sign bit (even though none of the legal values for this tag would trigger that) so since that cast might cause an error, it will be treated as one always.

Ultimately the cause of this bug are two things:

  1. Java only has precisely one unsigned integer type (char, for those of you playing along at home) and this library chose to use long to represent an unsigned 4 byte int.
  2. This particular file is out of spec and used unsigned int for this tag

Or more specifically, there is an impedance mismatch between the chosen java types and this TIFF file. This field code is attempting to be type-strong. The calling code is is attempting to accept a wide variety of types. It missed this one case.

I looked at my own tag code for grins to see if it would suffer from this particular problem. The answer is no as my version of getIntValue() will let you overflow into the sign bit if that's what you want to do.

So the real fix is to change the code to:

TIFFField fillOrderField =  dir.getField(TIFFConstants.TIFFTAG_FILLORDER);
if (fillOrderField != null)
    fillOrder = (int)fillOrderField.getAsLong(0);

or alternately to perform HEX surgery on your file and change the data type of the fill order tag to unsigned short. This is ultimately a poor solution as that consuming code is still susceptible to bad TIFF files.


Gratuitous Screed

One thing I've learned in the past 10 years of working with TIFF files is that there are no shortage of broken TIFF files and no shortage of engineers who either didn't read the spec or failed to implement it correctly making new broken files (and once in a while, I have been that engineer). A number of these are grad students who need TIFF output RIGHT NOW and write a quick and dirty (broken) encoder which they consider correct when IrfanView can open their output (which is an invalid test since IrfanView, and my TIFF codec as well, open a wide variety of fundamentally broken TIFFs).

The TIFF specification is deceptively straight forward. I say that because the format itself feels like it should be relatively easy to generate. Tags are logical, IFDs are simple collections of tags, pointer tags can be tricky, but are manageable. What happens is that code gets written that lacks a level of abstraction which would prevent classes of error that would otherwise slip through.

This particular file was not written by a grad student. At least I don't think so.

In this case, this problem was likely caused by fCoder. We know this because they put that into the Software string Created by fCoder Graphics Processor. I'm calling them out because they put the software string to identify themselves. This bug (an incorrect type, likely due to a copy-paste error in their source), while a minor bug, is causing problems and maybe they'll fix that. In my world, the #1 top-priority-drop-everything bug is "generates a bad file." and if I had done this, I sure as hell would want to know so I could fix my code. Meantime, iText should also update their code to be able to accept this class of file.

Lessons learned:

  1. The specification is the answer to the question "is my file correct."
  2. It's hard to write a decent TIFF encoder or decoder. Consider a commercial library before writing your own (although in this example, we found bugs in not one but two commercial libraries).
  3. Put in the software string when you generate your file so we can contact you when there is an issue.

Here endeth the lesson.

plinth
  • 48,267
  • 11
  • 78
  • 120
  • Great answer. I just saw a reference to it from one of our engineers in our internal issue tracker. We'll fix this in iText. It feels like we've been fixing TIFF problems for ages. – Bruno Lowagie Jun 22 '15 at 16:05
  • 1
    Bad news, @Bruno, but you're going to be fixing TIFF issues forever. Or good news because you'll have employment forever. [See also](http://content.atalasoft.com/h/i/83021832-designing-file-formats) – plinth Jun 22 '15 at 17:01
0

It's Mikhael Bolgov from fCoder.

We've checked the bad file linked in one of the first messages. There is a line in it's structure:

0131.H Software               ASCII 35 "Created by fCoder Graphic Processor"

Please note that it's named fCoder Graphic Processor. We used to write it that way till about 2005-2006. In newer versions it's "fCoder Graphics Processor".

So our processor might have created file with this mistake. But it would be a very very old version.

Here is an example of a file created with the latest version of our 2TIFF software that is working on the latest version of our processor:

Header
Byte order = Littleendian
Version  = 2A.H, TIFF 6.0
First IFD = 8.H
End of header

[Root IFD] 00000008.H
00FE.H New subfile type                      LONG 1 (0.H) [Full
resolution image]
0100.H Image width                           LONG 1 280
0101.H Image height                          LONG 1 560
0102.H Bits per sample                       SHORT 1 1
0103.H Compression                           SHORT 1 (0004.H) CCITT
Group 4/ T.6/ MMR
0106.H Photometric interpretation            SHORT 1 Black is zero
010A.H Fill order                            SHORT 1 1
0111.H Strip offsets                         LONG 3 [206, 16804, 35502]
0115.H Samples per pixel                     SHORT 1 1
0116.H Rows per strip                        LONG 1 234
0117.H Strip byte counts                     LONG 3 [16598, 18698, 3915]
011A.H X resolution                          RATIONAL 1 96 (96 / 1 = 96)
011B.H Y resolution                          RATIONAL 1 96 (96 / 1 = 96)
011C.H Planar configuration                  SHORT 1 Single plane
0128.H Resolution unit                       SHORT 1 Inch
0131.H Software                              ASCII 37 "Created by
fCoder Graphics Processor"
[Next IFD] 00000000.H

Root pages = 1
Total pages = 1

So once again. New versions of our graphics processor create correct TIFF files. And most likely they do it for the last 10 years.

  • I was receiving the file from a client which had some scanning software generating the TIFF files. So clearly the scanning software they use has the issue. My solution was to just copy the bad files via Apache Commons Image library via `Imaging.getBufferedImage(file)`. Thanks again for the assist – Timm-ah Oct 29 '15 at 01:35