2

When digitally signing document with itext v5.5.11 PDF/A-2b documents get corrupted - meaning they are no longer valid as PDF/A documents. Following rule is violated: https://github.com/veraPDF/veraPDF-validation-profiles/wiki/PDFA-Parts-2-and-3-rules#rule-643-1

In the link above it is specified that digest is invalid therefore I'm also giving you a code segment which deals with computing digest while signing pdf document with iText:

        // Make the digest
        InputStream data;
        try {

            data = signatureAppearance.getRangeStream();
        } catch (IOException e) {
            String message = "MessageDigest error for signature input, type: IOException";
            signLogger.logError(message, e);
            throw new CustomException(message, e);
        }
        MessageDigest messageDigest;
        try {
            messageDigest = MessageDigest.getInstance("SHA1");

        } catch (NoSuchAlgorithmException ex) {
            String message = "MessageDigest error for signature input, type: NoSuchAlgorithmException";
            signLogger.logError(message, ex);
            throw new CustomException(message, ex);
        }
        byte[] buf = new byte[8192];
        int n;
        try {
            while ((n = data.read(buf)) > 0) {
                messageDigest.update(buf, 0, n);
            }
        } catch (IOException ex) {
            String message = "MessageDigest update error for signature input, type: IOException";
            signLogger.logError(message, ex);
            throw new CustomException(message, ex);
        }
        byte[] hash = messageDigest.digest();
        // If we add a time stamp:
        // Create the signature
        PdfPKCS7 sgn;
        try {

            sgn = new PdfPKCS7(key, chain, configuration.getSignCertificate().getSignatureHashAlgorithm().value() , null, new BouncyCastleDigest(), false);
        } catch (InvalidKeyException ex) {
            String message = "Certificate PDF sign error for signature input, type: InvalidKeyException";
            signLogger.logError(message, ex);
            throw new CustomException(message, ex);
        } catch (NoSuchProviderException ex) {
            String message = "Certificate PDF sign error for signature input, type: NoSuchProviderException";
            signLogger.logError(message, ex);
            throw new CustomException(message, ex);
        } catch (NoSuchAlgorithmException ex) {
            String message = "Certificate PDF sign error for signature input, type: NoSuchAlgorithmException";
            signLogger.logError(message, ex);
            throw new CustomException(message, ex);
        }catch (Exception ex) {
            String message = "Certificate PDF sign error for signature input, type: Exception";
            signLogger.logError(message, ex);
            throw new CustomException(message, ex);
        }
        byte[] sh = sgn.getAuthenticatedAttributeBytes(hash, null,null, MakeSignature.CryptoStandard.CMS);
        try {
            sgn.update(sh, 0, sh.length);
        } catch (java.security.SignatureException ex) {
            String message = "Certificate PDF sign error for signature input, type: SignatureException";
            signLogger.logError(message, ex);
            throw new CustomException(message, ex);
        }
        byte[] encodedSig = sgn.getEncodedPKCS7(hash);
        if (contentEstimated + 2 < encodedSig.length) {
            String message = "The estimated size for the signature is smaller than the required one. Terminating request..";
            signLogger.log("ERROR", message);
            throw new CustomException(message);
        }
        byte[] paddedSig = new byte[contentEstimated];
        System.arraycopy(encodedSig, 0, paddedSig, 0, encodedSig.length);
        // Replace the contents
        PdfDictionary dic2 = new PdfDictionary();
        dic2.put(PdfName.CONTENTS, new PdfString(paddedSig).setHexWriting(true));
        try {
            signatureAppearance.close(dic2);
        } catch (IOException ex) {
            String message = "PdfSignatureAppearance close error for signature input, type: IOException";
            signLogger.logError(message, ex);
            throw new CustomException(message, ex);
        } catch (DocumentException ex) {
            String message = "PdfSignatureAppearance close error for signature input, type: DocumentException";
            signLogger.logError(message, ex);
            throw new CustomException(message, ex);
        }

For PDF/A validation I use VeraPDF library.

It may also be helpful to mention that while VeraPDF library reports corrupted PDF/A library, Adobe Reader validation tools reports PDF/A document isn't corrupted.

Any help would be much appreciated.

Thorbjørn Ravn Andersen
  • 73,784
  • 33
  • 194
  • 347
  • Can you reproduce this with iText v7.0.2? Can you share a sample pdf? – Amedee Van Gasse Apr 20 '17 at 14:38
  • 1
    Sample pdf can be found [here](https://www.docdroid.net/Id5kxvO/samplepdfadocsigned.pdf.html). – Peter Veselinović Apr 20 '17 at 14:55
  • 1
    The byte range in your sample file does not violate that rule. If VeraPDF complains, it is wrong. BTW, Adobe Reader also would complain about signatures not covering their whole revision minus the signature value, it does not even accept such signatures in regular PDFs. – mkl Apr 20 '17 at 15:41
  • As an aside: Is there any specific reason you use SHA1? In many regulated signature contexts this algorithms is not considered secure anymore resulting in a limited legal value if any at all. – mkl Apr 20 '17 at 16:37

3 Answers3

4

When digitally signing document with itext v5.5.11 PDF/A-2b documents get corrupted - meaning they are no longer valid as PDF/A documents. Following rule is violated: https://github.com/veraPDF/veraPDF-validation-profiles/wiki/PDFA-Parts-2-and-3-rules#rule-643-1

While this is indeed what veraPDF claims, this is wrong; iText creates signatures covering their whole revision minus the space reserved for the signature container.

The reason for this incorrect violation detection is an error in veraPDF.

How veraPDF determines whether the signed byte ranges is valid

Both the veryPDF version (the one based on the greenfield parser and the one based on PDFBox) attempt to determine a nominal byte ranges value and compare that to the actual one. This is how it determines the nominal value:

public long[] getByteRangeBySignatureOffset(long signatureOffset) throws IOException {
    pdfSource.seek(signatureOffset);
    skipID();
    byteRange[0] = 0;
    parseDictionary();
    byteRange[3] = getOffsetOfNextEOF(byteRange[2]) - byteRange[2];
    return byteRange;
}

private long getOffsetOfNextEOF(long currentOffset) throws IOException {
    byte[] buffer = new byte[EOF_STRING.length];
    pdfSource.seek(currentOffset + document.getHeaderOffset());
    readWholeBuffer(pdfSource, buffer);
    pdfSource.rewind(buffer.length - 1);
    while (!Arrays.equals(buffer, EOF_STRING)) {    //TODO: does it need to be optimized?
        readWholeBuffer(pdfSource, buffer);
        if (pdfSource.isEOF()) {
            pdfSource.seek(currentOffset + document.getHeaderOffset());
            return pdfSource.length();
        }
        pdfSource.rewind(buffer.length - 1);
    }
    long result = pdfSource.getPosition() + buffer.length - 1;  // offset of byte after 'F'
    pdfSource.seek(currentOffset + document.getHeaderOffset());
    return result - 1;
}

(PDFBox based SignatureParser class)

public long[] getByteRangeBySignatureOffset(long signatureOffset) throws IOException {
    source.seek(signatureOffset);
    skipID();
    byteRange[0] = 0;
    parseDictionary();
    byteRange[3] = getOffsetOfNextEOF(byteRange[2]) - byteRange[2];
    return byteRange;
}

private long getOffsetOfNextEOF(long currentOffset) throws IOException {
    byte[] buffer = new byte[EOF_STRING.length];
    source.seek(currentOffset + document.getHeader().getHeaderOffset());
    source.read(buffer);
    source.unread(buffer.length - 1);
    while (!Arrays.equals(buffer, EOF_STRING)) {    //TODO: does it need to be optimized?
        source.read(buffer);
        if (source.isEOF()) {
            source.seek(currentOffset + document.getHeader().getHeaderOffset());
            return source.getStreamLength();
        }
        source.unread(buffer.length - 1);
    }
    long result = source.getOffset() - 1 + buffer.length;   // byte right after 'F'
    source.seek(currentOffset + document.getHeader().getHeaderOffset());
    return result - 1;
}

(greenfield parser based SignatureParser)

Essentially both implementations do the same here, starting at the signature they look for the next occurrence of the end-of-file marker %%EOF and attempt to complete the nominal byte ranges value so that the second range ends with that marker.

Why this is wrong

There are multiple reasons why this way of determining the nominal signed byte ranges value is wrong:

  1. According to the PDF/A specifications,

    No data can follow the last end-of-file marker except a single optional end-of-line marker as described in ISO 32000-1:2008, 7.5.5.

    Thus, the offset directly after the next end-of-file marker %%EOF is not necessarily already the end of the signed revision, the correct offset might be the one after a following end-of-line marker! And as a PDF end-of-line marker can be either a single CR or a single LF or a CRLF combination, this means that veraPDF picks one of the three possible offsets and claims it to be the nominal end of the revision and, therefore, the nominal end of the signed byte ranges.

  2. It is possible (even though hardly ever seen) that a signature value is prepared in one revision (ending in an end-of-file marker), then some data are appended in an incremental update giving rise to a new revision (ending in another end-of-file marker), and then the signature value is filled in with values signing the document including this new revision.

    As veraPDF uses the next end-of-file marker after the signature dictionary, in this situation veraPDF actually picks the wrong end-of-file marker.

  3. The end-of-file marker %%EOF syntactically actually is merely a comment with a special meaning at the end of a PDF / revision, and comments are allowed nearly everywhere in a PDF outside PDF strings, PDF stream data, and PDF cross reference tables. Thus, the byte sequence %%EOF can occur as a regular comment or as a non-comment content of a string or stream any number of times between the signature value dictionary and the actual end of the signed revision.

    If there is such an occurrence, veraPDF picks a byte sequence as an end-of-file marker which never has been meant as an end of something.

Furthermore, unless the actual end-of-file is reached in the loop (and pdfSource.length() / source.getStreamLength() is returned), the result appears to be off-by-one, the - 1 in return result - 1 does not correspond with the use of the result.

veraPDF versions

I checked against the current 1.5.0-SNAPSHOT versions of veraPDF which are tagged:

  • veraPDF-pdfbox-validation 1.5.4
  • veraPDF-validation 1.5.2
  • veraPDF-parser 1.5.1

The OP's sample document

The sample document provided by the OP has a LF after the end-of-file marker. Due to this and the off-by-one issue mentioned above, veraPDF determines a nominal signed byte ranges end which is two bytes short.

mkl
  • 90,588
  • 15
  • 125
  • 265
2

As discussed above, we just released the hotfix for veraPDF 1.4 that addresses the issues in this discussion. The new version is available for download: http://downloads.verapdf.org/rel/1.4/verapdf-1.4.5-installer.zip

In particular, iText-signed PDF/A-2 documents seem to pass veraPDF validation just fine.

  • Great! Just a hint, though, how stack overflow is meant to be used: If you have an addendum to your existing answer (like in this case here), you usually edit it into that (there is an edit link right underneath it); you only create a new answer to present and elaborate on a completely different approach. As you see the answers are not simply sorted by date but also by a number of other criteria, among them acceptance and votes. Your multiple answers on the same approach, therefore, can too easily be separated by that sort order. – mkl Apr 28 '17 at 08:03
  • BTW, the `SignatureParser` fix looks different in 1.4.x and in 1.5.x, in 1.4.x it looks more complete, also taking a single CR into account, while not so in 1.5.x. – mkl Apr 28 '17 at 09:09
0

I do agree with the analysis of how veraPDF checks the ByteRange at the moment. Indeed, it assumes the file terminates exactly at the %EOF marker immediately following the signature field.

The reason is quite simple. The document can be signed sequentially by several people, and can still be a valid PDF/A-2B document. When the second signature is generated, it will incrementally update the file containing the first signature.

So, if we interpret the term file in the PDF/A-2B requiments literally:

When computing the digest for the file, it shall be computed over the entire file, including the signature dictionary but excluding the PDF Signature itself. This range is then indicated by the ByteRange entry of the signature dictionary.

we wound never be able to create a valid PDF/A file with multiple signatures. This was clearly not an intention of the PDF/A-2 standard.

The PDF file is usually understood as a byte range between the leading %PDF to the trailing %EOF to allow, for example, for PDF files as a part of a bigger byte stream (eg., mail attachments). This is what veraPDF implementation is based on.

I do agree however that this approach does not take into account the optional end-of-line sequence after %EOF. I've created the corresponding issue for veraPDF: https://github.com/veraPDF/veraPDF-validation/issues/166

It leaves though an interesing question: what is a valid ByteRange of the first signature in case the document has more signatures? I believe, all cases:

  • ByteRange covers the file till the next following %EOF marker
  • ByteRange covers the file till the next following %EOF marker + a single CR character
  • ByteRange covers the file till the next following %EOF marker + a single LF character
  • ByteRange covers the file till the next following %EOF marker + a two-byte CR+LF sequence

should be allowed.

  • Items 2. and 3. in my answer should make clear why using *"the next following %EOF marker"* at all is a very bad idea: This byte sequence can occur multiple times before the actual correct end of the signed revision. In my opinion all you can do is check whether (A) the byte ranges consists of two not overlapping ranges, (B) the gap between these two ranges is exactly the signature **Content** value, (C) the range before the gap starts at offset 0, and (D) the file segment from the start of the PDF up to the end of the range after the gap is a valid PDF (not necessarily PDF/A). – mkl Apr 21 '17 at 20:26
  • Any further conditions (e.g. requiring the end to coincide with the next occurrence of the bytes `%%EOF`) assumes too much about the history of the PDF file. If I read ISO-19005 correctly (I just only have part 3 at hands), the signatures in a valid PDF/A-* are not required to be valid signatures; thus, quite some wild changes are possible in and after the signed revision... – mkl Apr 21 '17 at 20:41
  • Sure, cryptographic validity of signatures is outside of the scope of PDF/A. Thanks for an alternative suggestion (A)-(D) for implementing ByteRange check. Though It still has an issue with what is a valid PDF document. – Boris Doubrov Apr 22 '17 at 05:24
  • So, we might first go for a quick fix along the line, I suggested in my first reply, which will be OK for most if not all real world PDFs. And do a bit more research for the "proper" fix. Of course, any alternative PRs are welcome. – Boris Doubrov Apr 22 '17 at 05:38
  • Yes, the quick fix (plus looking at the off-by-one I think I see) takes care of most wrong evaluations here. – mkl Apr 22 '17 at 09:52
  • *"cryptographic validity of signatures is outside of the scope of PDF/A"* - I actually did not only mean the cryptographic validity. According to ISO 32000, the presence of specific signatures (e.g. with MDP transformation) restricts the allowed changes in later additions to the document. But if these restrictions are violated, this should be interpreted as an invalidated signature (not of interest for PDF/A evaluation), not an invalid PDF (which would be of interest for PDF/A evaluation). – mkl Apr 22 '17 at 10:00