1

I have generated a PDF Using the following PDF code its working fine but when i am trying to close ,its asking me to save.I have analyzed my PDF code to detect the problem. I have identified there is a problem in startxref offset size and xref offset position.I have done enough changes but i couldn't solve this problem(Do you want to save changes 'xxx.pdf' before closing). here is my PDF CODE

%PDF-1.4
%âãÏÓ
1 0 obj
<<
/Type/Catalog
/Pages 2 0 R
>>
endobj
2 0 obj
<<
/Type/Pages
/MediaBox[0 0 612.0 792.0]
/Count 1
/Kids [ 3 0 R ]
>>
endobj
3 0 obj
<<
/Type/Page
/Parent 2 0 R
/Resources 4 0 R 
/Contents 5 0 R
>>
endobj
4 0 obj
<<
/ExtGState <</GS1 7 0 R>>
/ProcSet[/PDF/Text/ImageB/ImageC/ImageI]
/Font<< /F1 8 0 R >>
>>
>>
endobj
5 0 obj
<</Length 44>>
stream
BT
/F1 18 Tf
0 g
1 0 0 1 100.0 400.0 Tm
(kersom) Tj
ET
endstream
endobj
6 0 obj<</Producer(Xxxxxxxx XXX Xxxxxxxx - 1.1)>>
endobj
7 0 obj
<</ca 0.35/CA 0.35>>
endobj
8 0 obj
<<
/Type /Font 
/Subtype /Type1
/BaseFont /Helvetica
>>
endobj
xref
0 9
0000000000 65535 f
0000000015 00000 n
0000000063 00000 n
0000000148 00000 n
0000000228 00000 n
0000000340 00000 n
0000000442 00000 n
0000000499 00000 n
0000000535 00000 n
trailer
<<
/Info 6 0 R
/Root 1 0 R
/Size 9
>>
startxref
606
%%EOF
kerZy Hart
  • 181
  • 4
  • 17
  • Please provide the file as a binary. In the textual form you provided it is not clear which white spaces have been used where (especially which line ends). Thus, a *a problem in startxref offset size and xref offset position* cannot be addressed in this form. – mkl May 23 '14 at 11:25
  • @mkl:Am sorry!pls bear my inconvenience. i didn't understand "the file as a binary". you want me to upload the PDF file (or) convert the PDF to streams. – kerZy Hart May 23 '14 at 13:55
  • *upload the PDF file*, depending on your file server making sure you don't do that in text mode (if you upload using ftp, please use binary mode). – mkl May 23 '14 at 14:06
  • @mkl:i have shared the PDF file to the following mail ID: mkl@wir-sind-cool.org – kerZy Hart May 23 '14 at 14:25
  • I'll look at it tomorrow. – mkl May 23 '14 at 22:39
  • @mkl:Eagerly waiting for your reply. – kerZy Hart May 26 '14 at 07:26
  • See my answer below, the cross reference table structure itself is broken. – mkl May 26 '14 at 07:52

2 Answers2

3

Having received the sample PDF in its original form, the issue immediately becomes clear: The offsets in the cross reference table are correct but that table itself is incorrectly built.

Let's look at a hex dump:

enter image description here

Obviously each entry in the cross reference table is 19 bytes in size.

Now let's look at the PDF specification:

Each entry shall be exactly 20 bytes long, including the end-of-line marker. [...] The format of an in-use entry shall be:

nnnnnnnnnn ggggg n eol

where:

nnnnnnnnnn shall be a 10-digit byte offset in the decoded stream 
ggggg shall be a 5-digit generation number 
n shall be a keyword identifying this as an in-use entry 
eol shall be a 2-character end-of-line sequence

[...] a 2-character end-of-line sequence consisting of one of the following: SP CR, SP LF, or CR LF. Thus, the overall length of the entry shall always be exactly 20 bytes

(section 7.5.4 Cross-Reference Table of ISO 32000-1)

Thus, the issue in the OP's PDF is that each cross reference table entry has an end-of-line sequence of only one byte, a LF, while it must have a 2-byte end-of-line sequence, either SP CR, SP LF, or CR LF.

This makes each entry one byte too short which in turn results in look-ups from that table returning utterly broken byte sequences.

Community
  • 1
  • 1
mkl
  • 90,588
  • 15
  • 125
  • 265
  • Thanks Michael! kerZy Hart, you should unaccept my answer and accept mkl's answer. – Bruno Lowagie May 26 '14 at 07:56
  • Merci. But please do also try to not create extra `>>` as pointed out by Bruno. – mkl May 26 '14 at 08:04
  • @BrunoLowagie: i have no idea what i am doing, i really never expected a reply from you both.I felt some what shame on my self to ask these kind of questions .But i have left with no options.Thanks and i am very happy for your analysis and comments.i won't stop asking questions on these topics .........Cheers – kerZy Hart May 26 '14 at 08:13
  • I asked the same kind of questions to Aandi Inston back in 1999. I now know Aandi personally from the ISO committee meetings ;-) – Bruno Lowagie May 26 '14 at 08:14
  • @mkl:Victory...! Victory ...! added **SP LF** at the eol its works fine. – kerZy Hart May 26 '14 at 11:10
2

Save the form with Adobe Reader and compare it at a binary level. You will discover a slight difference. For instance: the cross-reference table was rebuilt because you didn't take into account 'carriage return' characters, there was white space where you didn't expect it, etc...

Adobe Reader also fixes errors such as this one:

4 0 obj
<<
/ExtGState <</GS1 7 0 R>>
/ProcSet[/PDF/Text/ImageB/ImageC/ImageI]
/Font<< /F1 8 0 R >>
>>
>>

You have a double dictionary ending here (remove >>) once. That's at least one error in the PDF you've copy/pasted.

Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165
  • :I corrected the error but as u said its one of the error.Am writing the PDF file using buffered writer so i know where and what are the characters are going to be in the file.I hope that i am calculating the offset of an objects which all contain the character of my knowledge. – kerZy Hart May 23 '14 at 14:10
  • 1.What is carriage return characters? – kerZy Hart May 23 '14 at 14:10
  • 2.What are the situations,possibility and how the unexpected white-space araises? – kerZy Hart May 23 '14 at 14:12
  • :Carriage return - return to the start of the line.but how this apply to the real time IO operations – kerZy Hart May 23 '14 at 14:35
  • 1
    When you end a line, you sometimes have `\r\n` (e.g. this is typical on Windows) and sometimes you have `\n` (e.g. on Linux systems). If you don't know the difference, you are not ready to roll your own PDFs. Moreover: you didn't follow my advice! **Save your PDF! Compare the xref table!** – Bruno Lowagie May 23 '14 at 14:52
  • :I am really sorry i didn't informed u that .I have done what u told me before but i couldn't find xref on that even if i decrypted with PDF also.I accept and respect your valuable opinion and i know that still a mountain to climb but this is my little try.In PDF generation you and Paulo are my role models.There is no problems in Linux when i am opening this PDF. – kerZy Hart May 23 '14 at 15:08
  • Hmm... there is a cross-reference table, but it's compressed. This is a feature that was introduced in PDF 1.5, so I guess the PDF version is updated too. Do you have Acrobat? If so, you can probably force it to save it as PDF 1.4. If not, please share your PDF. – Bruno Lowagie May 23 '14 at 15:12
  • Another test you could do: open the PDF using `PdfReader` and check if the document needed to be rebuilt using the `isRebuilt()` method. If it returns `true`, there's an error in your byte offset. Or: maybe you get an Exception that tells you what's wrong. – Bruno Lowagie May 23 '14 at 15:15
  • :could u please give me your e-mail id – kerZy Hart May 23 '14 at 15:24
  • :Yes isRebuilt() returns me true – kerZy Hart May 23 '14 at 15:35
  • OK, so now you've established that your byte offsets are wrong. – Bruno Lowagie May 23 '14 at 16:27
  • I have done a simple test one meore : created a PDF using iText-2.1.7 and decrypted using **"qpdf"**. Then I removed xref table from the decrypted PDF.After the above steps completed i have generated a xref using my program.Then i compared the xref table which is generated by myself with yours xref, both are looks same. – kerZy Hart May 26 '14 at 07:37
  • And what does Adobe Reader tell you? Does it keep on asking about Saving the document? Also: is the byte count identical for both files? – Bruno Lowagie May 26 '14 at 07:39
  • :yes byte counts are identical for both files.But i didn't replaced yours(iText decrypted PDF) xref with mine.I simply tried to compare the xref tables.And adobe reader keep asking me to save the document. – kerZy Hart May 26 '14 at 07:49
  • Then share the PDF, put it on a site somewhere we can download it from (Google Drive, Dropbox,...). – Bruno Lowagie May 26 '14 at 07:50
  • Sure sir, could send me ur mail id – kerZy Hart May 26 '14 at 07:54
  • i can share u through jumpShare – kerZy Hart May 26 '14 at 07:54
  • There are many ways to share a PDF. Mail is so 20th century. Please use another channel. – Bruno Lowagie May 26 '14 at 07:54
  • It's no longer necessary. The mystery was solved. It was indeed a carriage return issue. – Bruno Lowagie May 26 '14 at 07:56