I'm going to go into a deep dive of hex surgery on your file, the cause of the exception in iText and ultimately the cause of this bug. I will then go on a screed describing why this happens.
Your file is structured so that the primary IFD is at the end of the file. Here is the file header:
49 49 2A 00 96 6C 00 00
intel magic offset-----
Which says "I'm a TIFF in Intel (little endian) byte order and my primary IFD starts at offset 0x6c9c.
If you skip ahead to this spot you see this:
0F 00 <- this is the total number of tags, each tag is 12 bytes
# | ID |Type | Count | Value |
01. 00 01 04 00 01 00 00 00 A2 06 00 00 width = 6a2
02. 01 01 04 00 01 00 00 00 4A 04 00 00 height = 44a
03. 02 01 03 00 01 00 00 00 01 00 00 00 bits per sample = 1
04. 03 01 03 00 01 00 00 00 04 00 00 00 Compression = CCITT G4
05. 06 01 03 00 01 00 00 00 00 00 00 00 Photometric = min is white
06. 0A 01 04 00 01 00 00 00 01 00 00 00 Fill order = msb to lsb
07. 11 01 04 00 01 00 00 00 08 00 00 00 Offset of strips = 8
08. 15 01 03 00 01 00 00 00 01 00 00 00 Samples per pixel = 4
09. 16 01 04 00 01 00 00 00 4A 04 00 00 Rows per strip = 448
0a. 17 01 04 00 01 00 00 00 5B 6C 00 00 Strip byte counts = 6c5b
0b. 1A 01 05 00 01 00 00 00 63 6C 00 00 Offset to x resolution = 6c63
0c. 1B 01 05 00 01 00 00 00 6B 6C 00 00 Offset to y resolution = 6c6b
0d. 1C 01 03 00 01 00 00 00 01 00 00 00 Planar Config = Contiguous
0e. 28 01 03 00 01 00 00 00 02 00 00 00 Resolution unit = inches
0f. 31 01 02 00 23 00 00 00 73 6C 00 00 Software string offset = 6c73
Location of next IFD, 0 means no more
00 00 00 00
Now, looking at the call stack and tracing it back to source, I see a call being made to get the fill order. Fill order for 1 bit files describes whether the high order bit or the low order bit in a byte is leftmost in display.
TIFFField fillOrderField = dir.getField(TIFFConstants.TIFFTAG_FILLORDER);
if (fillOrderField != null)
fillOrder = fillOrderField.getAsInt(0);
We know that this will get called since there is a fill order tag in your IFD, which is a 4 byte integer with value 1.
Unfortunately for you, that call to TIFFFIELD.getAsInt(0)
is causing a failure.
If you look at that code:
public int getAsInt(int index) {
switch (type) {
case TIFF_BYTE: case TIFF_UNDEFINED:
return ((byte[])data)[index] & 0xff;
case TIFF_SBYTE:
return ((byte[])data)[index];
case TIFF_SHORT:
return ((char[])data)[index] & 0xffff;
case TIFF_SSHORT:
return ((short[])data)[index];
case TIFF_SLONG:
return ((int[])data)[index];
default:
throw new ClassCastException();
}
}
You can see that it can throw a ClassCastException if type doesn't match, and in this case it will since those type constants in the cases are 1, 7, 6, 3, 8 and 9 respectively and the tag's type is 4.
So why is the code wrong?
The problem with TIFF tags is that even though the spec is pretty clear about the fact that the FillOrder tag (10a) should be an unsigned short (type 3), the tag in your file is an unsigned 4 byte int (type 4), but the switch statement there doesn't account for that (no case for TIFF_LONG).
Why is there no case for this? Looking at the surrounding code, this library treats 4 byte unsigned integers as the java type 'long' and trying to treat a 4 byte unsigned int as 4 byte signed int could cause an overflow into the sign bit (even though none of the legal values for this tag would trigger that) so since that cast might cause an error, it will be treated as one always.
Ultimately the cause of this bug are two things:
- Java only has precisely one unsigned integer type (
char
, for those of you playing along at home) and this library chose to use long
to represent an unsigned 4 byte int.
- This particular file is out of spec and used
unsigned int
for this tag
Or more specifically, there is an impedance mismatch between the chosen java types and this TIFF file. This field code is attempting to be type-strong. The calling code is is attempting to accept a wide variety of types. It missed this one case.
I looked at my own tag code for grins to see if it would suffer from this particular problem. The answer is no as my version of getIntValue() will let you overflow into the sign bit if that's what you want to do.
So the real fix is to change the code to:
TIFFField fillOrderField = dir.getField(TIFFConstants.TIFFTAG_FILLORDER);
if (fillOrderField != null)
fillOrder = (int)fillOrderField.getAsLong(0);
or alternately to perform HEX surgery on your file and change the data type of the fill order tag to unsigned short
. This is ultimately a poor solution as that consuming code is still susceptible to bad TIFF files.
Gratuitous Screed
One thing I've learned in the past 10 years of working with TIFF files is that there are no shortage of broken TIFF files and no shortage of engineers who either didn't read the spec or failed to implement it correctly making new broken files (and once in a while, I have been that engineer). A number of these are grad students who need TIFF output RIGHT NOW and write a quick and dirty (broken) encoder which they consider correct when IrfanView can open their output (which is an invalid test since IrfanView, and my TIFF codec as well, open a wide variety of fundamentally broken TIFFs).
The TIFF specification is deceptively straight forward. I say that because the format itself feels like it should be relatively easy to generate. Tags are logical, IFDs are simple collections of tags, pointer tags can be tricky, but are manageable. What happens is that code gets written that lacks a level of abstraction which would prevent classes of error that would otherwise slip through.
This particular file was not written by a grad student. At least I don't think so.
In this case, this problem was likely caused by fCoder. We know this because they put that into the Software string Created by fCoder Graphics Processor
. I'm calling them out because they put the software string to identify themselves. This bug (an incorrect type, likely due to a copy-paste error in their source), while a minor bug, is causing problems and maybe they'll fix that. In my world, the #1 top-priority-drop-everything bug is "generates a bad file." and if I had done this, I sure as hell would want to know so I could fix my code. Meantime, iText should also update their code to be able to accept this class of file.
Lessons learned:
- The specification is the answer to the question "is my file correct."
- It's hard to write a decent TIFF encoder or decoder. Consider a commercial library before writing your own (although in this example, we found bugs in not one but two commercial libraries).
- Put in the software string when you generate your file so we can contact you when there is an issue.
Here endeth the lesson.