Going through the PDF spec, it says that the trailer
precedes the startxref
. Which to me, says that the xref
can appear anywhere in the document, but the trailer
still appears before the startxref
. This makes sense until you have to parse it, because you have to parse in reverse you can't take into account comments or strings. Lets get a little more wacky then.
trailer<< %\
/Size 4 %\
/Root 1 0 R %\
/Info 4 0 R %\
/Key (\
trailer<< %\
/Size 4 %\
/Root 2 0 R %\
/Info 3 0 R %\
>>%)
>>&)
% test test )
startxref
15
%%EOF
Which is a perfectly valid trailer. The first one is the real trailer, but the second one is in a "string". In this case, reverse parsing is going to fail to catch the comments. Looking for the string trailer is going to fail if its apart of a comment or string. I was wondering what the best way of finding out where the trailer starts is?
Update - This trailer seems to open in Acrobat Reader
%PDF-1.3
%âãÏÓ
xref
0 4
00000000 65535 f
00000110 00000 n
00000250 00000 n
00000315 00000 n
00000576 00000 n
1 0 obj <<
/Type /Catalog
/Pages 2 0 R
/OpenAction [ 3 0 R /XYZ null null null ]
/PageLabels << /Nums [0 << /S /D >> ] >>
>>
endobj
2 0 obj <<
/Type /Pages
/Kids [ 3 0 R ]
/Count 1
>>
endobj
3 0 obj <<
/Type /Page
/Parent 2 0 R
/Resources << >>
/MediaBox [ 0 0 612 792 ]
>>
endobj
4 0 obj <<
/Producer (Me)
/CreationDate (D:20110626000000Z)
>>
endobj
trailer<< %\
/Size 4 %\
/Root 1 0 R %\
/Info 4 0 R %\
/Key (\
trailer<< %\
/Size 4 %\
/Root 2 0 R %\
/Info 3 0 R %\
>>%)
>>%)
% test test )
startxref
15
%%EOF
As far as syntax goes, this conforms to spec. Somehow they seem to be able to know if they are in a comment, or a string. Parsing L-R, the second trailer is in a string with a % tailed on, with a comment after the trailer. But R-L parsing, you have no idea if the first ) is part of a comment, or the end of a string definition.
Another Example:
%PDF-1.3
%âãÏÓ
xref
0 8
0000000000 65535 f
0000000210 00000 n
0000000357 00000 n
0000000428 00000 n
0000000533 00000 n
0000000612 00000 n
0000000759 00000 n
0000000830 00000 n
0000000935 00000 n
1 0 obj <<
/Type /Catalog
/Pages 2 0 R
/OpenAction [ 3 0 R /XYZ null null null ]
/PageLabels << /Nums [0 << /S /D >> ] >>
>>
endobj
2 0 obj <<
/Type /Pages
/Kids [ 3 0 R ]
/Count 1
>>
endobj
3 0 obj <<
/Type /Page
/Parent 2 0 R
/Resources << >>
/MediaBox [ 0 0 612 792 ]
>>
endobj
4 0 obj <<
/Producer (Me)
/CreationDate (D:20110626000000Z)
>>
endobj
5 0 obj <<
/Type /Catalog
/Pages 6 0 R
/OpenAction [ 7 0 R /XYZ null null null ]
/PageLabels << /Nums [0 << /S /D >> ] >>
>>
endobj
6 0 obj <<
/Type /Pages
/Kids [ 7 0 R ]
/Count 1
>>
endobj
7 0 obj <<
/Type /Page
/Parent 6 0 R
/Resources << >>
/MediaBox [ 0 0 100 100 ]
>>
endobj
8 0 obj <<
/Producer (Me)
/CreationDate (D:20110626000000Z)
>>
endobj
trailer<< %\
/Size 8 %\
/Root 1 0 R %\
/Info 4 0 R %\
/Key (\
trailer<< %\
/Size 8 %\
/Root 5 0 R %\
/Info 8 0 R %\
>>%)
>>%)
% test test )
startxref
17
%%EOF
This example, is displayed correctly in Adobe. In my last case, you claimed it would fail because the "root" node is invalid, but this new sample, the root is valid, but its never actually used. So shouldn't it display a 100x100 window, instead of the 8.5"x11"?
In regard to the Resources
(Required; inheritable) A dictionary containing any resources required by the page
(see Section 3.7.2, “Resource Dictionaries”). If the page requires no resources, the
value of this entry should be an empty dictionary. Omitting the entry entirely
indicates that the resources are to be inherited from an ancestor node in the page
tree.