0

Here's a hello world Pdf creatred by my self.It's xref info is store in XRefStm .How every it failed open with Adobe Reader.Can some body tell me the reason ? Thank you in advance!

RoyDeng.

%PDF-1.7
1 0 obj << /Length 94 >>stream
BT 10 782 Td /0 50 Tf 50 TL (Hello)' (World)' (OK)Tj (World)' Tj ET
endstream endobj
2 0 obj << /Count 1 /Kids 3 0 R /Type /Pages >> endobj
3 0 obj [ 4 0 R ] endobj
4 0 obj << /Contents 5 0 R /MediaBox 6 0 R /Parent 2 0 R /Resources 10 0 R /Type /Page >> endobj
5 0 obj [ 1 0 R ] endobj
6 0 obj [ 0 0 612 792 ] endobj
7 0 obj << /BaseFont /Helvetica /Encoding /MacRomanEncoding /Subtype /Type1 /Type /Font >> endobj
8 0 obj << /0 7 0 R >> endobj
9 0 obj [ /PDF /Text ] endobj
10 0 obj << /Font 8 0 R /ProcSet 9 0 R >> endobj
11 0 obj << /Pages 2 0 R /Type /Catalog /PageLayout /OneColumn >> endobj
12 0 obj << /Type /XRef /Index [0 11] /W [1 4 1] /Filter /ASCIIHexDecode /Size 12 /Length 144 /Root 11 0 R >>stream
00 00000000 00
01 00000009 00
01 0000009A 00
01 000000D1 00
01 000000EA 00
01 0000014B 00
01 00000164 00
01 00000183 00
01 000001E5 00
01 00000203 00
01 00000221 00
01 00000252 00
endstream
endobj
startxref
667
%%EOF

Dingo
  • 2,619
  • 1
  • 22
  • 32
RoyDeng
  • 1
  • 1

2 Answers2

0

The first obvious error is that all addresses in a PDF are usually Decimal values and thus there would not normally be any letters in the xref, hence the common xref error message about the xref structure as not understood.

Could not read x-ref table

This is all down to the method used here for encoding obj 12 as a hex index, that is not wrong, as xrefs may be EITHER ANSI string tables OR encoded xref stream, but for "compatibility" should never be both see (Mixing XRef Tables and XRef Streams)

01 0000009A 00
01 000000D1 00
01 000000EA 00
01 0000014B 00

I am unable to find any files that use this hybrid method successfully to show their structure but the format flag may tend to be /W [1 2 1] or [1 2 2] certainly this approach has been a frequent fail see similar question PDF that renders in Chrome but not in Acrobat a common final comment in many such cases is

Since that I've managed to test, that the compression can be either nothing or FlateDecode w/wo Predictor, but nothing else. Namely the ASCIIHexDecode demonstrated in the RM is unusable.

A normal /Flate encoded xref would be something like this, but is even harder to follow

12 0 obj
<</DecodeParms<</Columns 4/Predictor 12>>/Filter/FlateDecode/Length 57/Root 1 0 R/Size 12/Type /XRef/W [1 2 1]>>
stream
xÚcb``øÏÄÈÀÏÈÄÀÁÀÄÈ0—é?÷-&Ưj 1â?wòo&ƯsÙÅxÕ~  zoW
endstream
endobj

Many (but not all) PDF viewers can work around a poor xref index, even if missing. However, the normal best approach is to use a simple non encoded decimal xref index and trailer like below. So many file fixers will tend to unpack those encodings to rebuild as this simpler unencoded format. Which was accepted by all the viewers I tested. Beware the EOL characters may be affected by simple cut and paste (thus windows is likely to alter these text addresses.)

%PDF-1.7
1 0 obj <</Length 67>> stream
BT 10 782 Td /0 50 Tf 50 TL (Hello)' (World)' (OK)Tj (World)' Tj ET
endstream endobj
2 0 obj <</Count 1/Kids 3 0 R/Type/Pages>> endobj
3 0 obj [4 0 R] endobj
4 0 obj <</Contents 5 0 R/MediaBox 6 0 R/Parent 2 0 R/Resources 10 0 R/Type/Page>> endobj
5 0 obj [1 0 R] endobj
6 0 obj [0 0 612 792] endobj
7 0 obj <</BaseFont/Helvetica/Encoding/MacRomanEncoding/Subtype/Type1/Type/Font>> endobj
8 0 obj <</0 7 0 R>> endobj
9 0 obj [/PDF/Text] endobj
10 0 obj <</Font 8 0 R/ProcSet 9 0 R>> endobj
11 0 obj <</Pages 2 0 R/Type/Catalog/PageLayout/OneColumn>> endobj
xref
0 12
0000000000 65536 f 
0000000009 00000 n 
0000000124 00000 n 
0000000174 00000 n 
0000000197 00000 n 
0000000287 00000 n 
0000000310 00000 n 
0000000339 00000 n 
0000000428 00000 n 
0000000456 00000 n 
0000000483 00000 n 
0000000529 00000 n 
trailer
<</Size 12/Root 11 0 R>>
startxref
596
%%EOF

Result

enter image description here

As a means to reduce the amount of calculations needed the same 11 objects can be made simpler and smaller by a few adjustments to only 5 objects:-

%PDF-1.7
1 0 obj <</PageLayout /OneColumn/Pages 2 0 R/Type/Catalog>> endobj
2 0 obj <</Count 1/Kids [ 3 0 R ]/Type/Pages>> endobj
3 0 obj <</Contents 5 0 R/MediaBox [ 0 0 612 792 ]/Parent 2 0 R/Resources<</Font<</0 4 0 R >>/Procset[/PDF/Text]>>/Type/Page>> endobj
4 0 obj <</BaseFont /Helvetica/Encoding/MacRomanEncoding/Subtype/Type1/Type /Font>> endobj
5 0 obj <</Length 84>> stream
q BT 50 TL /0 50 Tf 10 782 Td T* (Hello) Tj T* (World) Tj (OK) Tj T* (World) Tj ET Q q Q
endstream
endobj

xref
0 6
0000000000 65536 f 
0000000009 00000 n 
0000000076 00000 n 
0000000130 00000 n 
0000000264 00000 n 
0000000355 00000 n 
trailer
<</Size 6/Root 1 0 R>>
startxref
492
%%EOF
K J
  • 8,045
  • 3
  • 14
  • 36
  • I think the op used an xref stream (and not an xref table) for a reason. – mkl Mar 09 '23 at 06:01
  • Not really. In the PDF spec at the end of section 7.5.8.4 there is an example PDF dump using `/Filter /ASCIIHexDecode`. But as the spec explains, this is only done *to make the format and contents of the cross reference stream readable*... – mkl Mar 09 '23 at 18:43
  • @mkl yes spotted that the implication is its an academic exercise not a working method, as far as I can tell no reader actually would be expected to understand so replaces with plain text or /flate – K J Mar 09 '23 at 18:46
  • Well, of course a reader is expected to be able to parse that. After all, it's valid pdf Syntax. It's merely not commonly used. – mkl Mar 10 '23 at 06:58
0

I tried to fix this code with pdftk

pdftk 1.pdf output fixed.pdf
Error: Failed to open PDF file: 
   1.pdf
Errors encountered.  No output created.
Done.  Input errors, so no output created.

then I tried with cpdftk (from coherent), and gained further info

#  cpdftk 1.pdf output fixed.pdf
cpdf could not read the file. Technical details follow:

Could not read x-ref table

finally I tried to fix file with Multivalent.jar

and Multivalent repaired pdf (I attach the result)

Dingo
  • 2,619
  • 1
  • 22
  • 32
  • Thanks for help. I visited 1-x.pdf and found that no filter on the Xref Stream , so I just simply remove the filter on my file. Good god !It can now open with Adobe Reader normally! What the hell going on with the filter !?? Orz – RoyDeng Feb 13 '12 at 04:58