1

I'm trying to implement a java pdf-a validator using PdfBox. The file I'm dealing with is a digital signed pdf (PADES). Either I use pdfbox or an online tool the result is:

        PreflightDocument document = parser.getPreflightDocument();
        document.validate();

1.1 : Header Syntax error, First line must match %PDF-1.\d
1.1 : Header Syntax error, Second line must begin with '%' followed by at least 4 bytes greater than 127
1.0 : Syntax error, Missing end of file marker '%%EOF'

But when I open the file with any reader it's format is indeed pdf-a. Having a look inside the pdf actually the first line isn't %PDF-1, or the last %%EOF. May because the pdf is signed. Could it be? And in case, how can I get over this kind of validation in signed pdf?

0ƒ;f    *†H†÷
 ƒ;V0ƒ;Q10
    `†He
 ƒ;‡=ƒ;‡8%PDF-1.4
%ÿÿÿÿ
1 0 obj
<<
...
....
.....

To notice the "%PDF-1.4" at the end of the 4th line

usr-local-ΕΨΗΕΛΩΝ
  • 26,101
  • 30
  • 154
  • 305
paul_333
  • 11
  • 4
  • 1
    Talk with the people who sent you that file. A PDF file has to start with %PDF and not with "0ƒ;f *†H†÷ ƒ;V0ƒ;Q10 `†He ƒ;‡=ƒ;‡8", even when signed. Re "But when I open the file with any reader it's format is indeed pdf-a" no it isn't. The text in the blue bar is "the file claims compliance". A claim isn't a fact. – Tilman Hausherr Mar 17 '17 at 12:39

1 Answers1

2

fleshing out Tilman's comment a bit...

The header

According to the PDF specification ISO 32000-1:

7.5.2 File Header

The first line of a PDF file shall be a header consisting of the 5 characters %PDF– followed by a version number of the form 1.N, where N is a digit between 0 and 7.

As the 'shall' indicates, this is a requirement. Thus, your file not only is not a valid PDF/A document, it is not even a valid PDF.

If you wonder why Adobe Reader does not complain... Adobe in their PDF 1.7 Reference in Annex H.3 Implementation Notes indicate that their software treats the PDF header requirement quite lax:

  1. Acrobat viewers require only that the header appear somewhere within the first 1024 bytes of the file.

Thus, you have an invalid PDF which Adobe viewers display nonetheless.

The footer

Again according to the PDF specification

7.5.5 File Trailer

The trailer of a PDF file enables a conforming reader to quickly find the cross-reference table and certain special objects. Conforming readers should read a PDF file from its end. The last line of the file shall contain only the end-of-file marker, %%EOF.

Again Adobe viewers accept some files failing to conform to this requirement; according to the Adobe PDF Reference:

  1. Acrobat viewers require only that the %%EOF marker appear somewhere within the last 1024 bytes of the file.

As the last line of your file is not %%EOF, this is another requirement for valid PDF files it fails to fulfill...


Concerning your claim:

But when I open the file with any reader it's format is indeed pdf-a

The Adobe Reader does not check whether a file actually is valid PDF/A, it only reports what the file claims to be.

Community
  • 1
  • 1
mkl
  • 90,588
  • 15
  • 125
  • 265