iText digital signature corrupts PDF/A 2b

Question

When digitally signing document with itext v5.5.11 PDF/A-2b documents get corrupted - meaning they are no longer valid as PDF/A documents. Following rule is violated: https://github.com/veraPDF/veraPDF-validation-profiles/wiki/PDFA-Parts-2-and-3-rules#rule-643-1

In the link above it is specified that digest is invalid therefore I'm also giving you a code segment which deals with computing digest while signing pdf document with iText:

        // Make the digest
        InputStream data;
        try {

            data = signatureAppearance.getRangeStream();
        } catch (IOException e) {
            String message = "MessageDigest error for signature input, type: IOException";
            signLogger.logError(message, e);
            throw new CustomException(message, e);
        }
        MessageDigest messageDigest;
        try {
            messageDigest = MessageDigest.getInstance("SHA1");

        } catch (NoSuchAlgorithmException ex) {
            String message = "MessageDigest error for signature input, type: NoSuchAlgorithmException";
            signLogger.logError(message, ex);
            throw new CustomException(message, ex);
        }
        byte[] buf = new byte[8192];
        int n;
        try {
            while ((n = data.read(buf)) > 0) {
                messageDigest.update(buf, 0, n);
            }
        } catch (IOException ex) {
            String message = "MessageDigest update error for signature input, type: IOException";
            signLogger.logError(message, ex);
            throw new CustomException(message, ex);
        }
        byte[] hash = messageDigest.digest();
        // If we add a time stamp:
        // Create the signature
        PdfPKCS7 sgn;
        try {

            sgn = new PdfPKCS7(key, chain, configuration.getSignCertificate().getSignatureHashAlgorithm().value() , null, new BouncyCastleDigest(), false);
        } catch (InvalidKeyException ex) {
            String message = "Certificate PDF sign error for signature input, type: InvalidKeyException";
            signLogger.logError(message, ex);
            throw new CustomException(message, ex);
        } catch (NoSuchProviderException ex) {
            String message = "Certificate PDF sign error for signature input, type: NoSuchProviderException";
            signLogger.logError(message, ex);
            throw new CustomException(message, ex);
        } catch (NoSuchAlgorithmException ex) {
            String message = "Certificate PDF sign error for signature input, type: NoSuchAlgorithmException";
            signLogger.logError(message, ex);
            throw new CustomException(message, ex);
        }catch (Exception ex) {
            String message = "Certificate PDF sign error for signature input, type: Exception";
            signLogger.logError(message, ex);
            throw new CustomException(message, ex);
        }
        byte[] sh = sgn.getAuthenticatedAttributeBytes(hash, null,null, MakeSignature.CryptoStandard.CMS);
        try {
            sgn.update(sh, 0, sh.length);
        } catch (java.security.SignatureException ex) {
            String message = "Certificate PDF sign error for signature input, type: SignatureException";
            signLogger.logError(message, ex);
            throw new CustomException(message, ex);
        }
        byte[] encodedSig = sgn.getEncodedPKCS7(hash);
        if (contentEstimated + 2 < encodedSig.length) {
            String message = "The estimated size for the signature is smaller than the required one. Terminating request..";
            signLogger.log("ERROR", message);
            throw new CustomException(message);
        }
        byte[] paddedSig = new byte[contentEstimated];
        System.arraycopy(encodedSig, 0, paddedSig, 0, encodedSig.length);
        // Replace the contents
        PdfDictionary dic2 = new PdfDictionary();
        dic2.put(PdfName.CONTENTS, new PdfString(paddedSig).setHexWriting(true));
        try {
            signatureAppearance.close(dic2);
        } catch (IOException ex) {
            String message = "PdfSignatureAppearance close error for signature input, type: IOException";
            signLogger.logError(message, ex);
            throw new CustomException(message, ex);
        } catch (DocumentException ex) {
            String message = "PdfSignatureAppearance close error for signature input, type: DocumentException";
            signLogger.logError(message, ex);
            throw new CustomException(message, ex);
        }

For PDF/A validation I use VeraPDF library.

It may also be helpful to mention that while VeraPDF library reports corrupted PDF/A library, Adobe Reader validation tools reports PDF/A document isn't corrupted.

Any help would be much appreciated.

Can you reproduce this with iText v7.0.2? Can you share a sample pdf? — Amedee Van Gasse, Apr 20 '17 at 14:38
Sample pdf can be found [here](https://www.docdroid.net/Id5kxvO/samplepdfadocsigned.pdf.html). — Peter Veselinović, Apr 20 '17 at 14:55
The byte range in your sample file does not violate that rule. If VeraPDF complains, it is wrong. BTW, Adobe Reader also would complain about signatures not covering their whole revision minus the signature value, it does not even accept such signatures in regular PDFs. — mkl, Apr 20 '17 at 15:41
As an aside: Is there any specific reason you use SHA1? In many regulated signature contexts this algorithms is not considered secure anymore resulting in a limited legal value if any at all. — mkl, Apr 20 '17 at 16:37

score 4 · Accepted Answer · answered Apr 21 '17 at 08:53

When digitally signing document with itext v5.5.11 PDF/A-2b documents get corrupted - meaning they are no longer valid as PDF/A documents. Following rule is violated: https://github.com/veraPDF/veraPDF-validation-profiles/wiki/PDFA-Parts-2-and-3-rules#rule-643-1

While this is indeed what veraPDF claims, this is wrong; iText creates signatures covering their whole revision minus the space reserved for the signature container.

The reason for this incorrect violation detection is an error in veraPDF.

How veraPDF determines whether the signed byte ranges is valid

Both the veryPDF version (the one based on the greenfield parser and the one based on PDFBox) attempt to determine a nominal byte ranges value and compare that to the actual one. This is how it determines the nominal value:

public long[] getByteRangeBySignatureOffset(long signatureOffset) throws IOException {
    pdfSource.seek(signatureOffset);
    skipID();
    byteRange[0] = 0;
    parseDictionary();
    byteRange[3] = getOffsetOfNextEOF(byteRange[2]) - byteRange[2];
    return byteRange;
}

private long getOffsetOfNextEOF(long currentOffset) throws IOException {
    byte[] buffer = new byte[EOF_STRING.length];
    pdfSource.seek(currentOffset + document.getHeaderOffset());
    readWholeBuffer(pdfSource, buffer);
    pdfSource.rewind(buffer.length - 1);
    while (!Arrays.equals(buffer, EOF_STRING)) {    //TODO: does it need to be optimized?
        readWholeBuffer(pdfSource, buffer);
        if (pdfSource.isEOF()) {
            pdfSource.seek(currentOffset + document.getHeaderOffset());
            return pdfSource.length();
        }
        pdfSource.rewind(buffer.length - 1);
    }
    long result = pdfSource.getPosition() + buffer.length - 1;  // offset of byte after 'F'
    pdfSource.seek(currentOffset + document.getHeaderOffset());
    return result - 1;
}

(PDFBox based SignatureParser class)

public long[] getByteRangeBySignatureOffset(long signatureOffset) throws IOException {
    source.seek(signatureOffset);
    skipID();
    byteRange[0] = 0;
    parseDictionary();
    byteRange[3] = getOffsetOfNextEOF(byteRange[2]) - byteRange[2];
    return byteRange;
}

private long getOffsetOfNextEOF(long currentOffset) throws IOException {
    byte[] buffer = new byte[EOF_STRING.length];
    source.seek(currentOffset + document.getHeader().getHeaderOffset());
    source.read(buffer);
    source.unread(buffer.length - 1);
    while (!Arrays.equals(buffer, EOF_STRING)) {    //TODO: does it need to be optimized?
        source.read(buffer);
        if (source.isEOF()) {
            source.seek(currentOffset + document.getHeader().getHeaderOffset());
            return source.getStreamLength();
        }
        source.unread(buffer.length - 1);
    }
    long result = source.getOffset() - 1 + buffer.length;   // byte right after 'F'
    source.seek(currentOffset + document.getHeader().getHeaderOffset());
    return result - 1;
}

(greenfield parser based SignatureParser)

Essentially both implementations do the same here, starting at the signature they look for the next occurrence of the end-of-file marker %%EOF and attempt to complete the nominal byte ranges value so that the second range ends with that marker.

Why this is wrong

There are multiple reasons why this way of determining the nominal signed byte ranges value is wrong:

According to the PDF/A specifications,

No data can follow the last end-of-file marker except a single optional end-of-line marker as described in ISO 32000-1:2008, 7.5.5.

Thus, the offset directly after the next end-of-file marker %%EOF is not necessarily already the end of the signed revision, the correct offset might be the one after a following end-of-line marker! And as a PDF end-of-line marker can be either a single CR or a single LF or a CRLF combination, this means that veraPDF picks one of the three possible offsets and claims it to be the nominal end of the revision and, therefore, the nominal end of the signed byte ranges.
It is possible (even though hardly ever seen) that a signature value is prepared in one revision (ending in an end-of-file marker), then some data are appended in an incremental update giving rise to a new revision (ending in another end-of-file marker), and then the signature value is filled in with values signing the document including this new revision.

As veraPDF uses the next end-of-file marker after the signature dictionary, in this situation veraPDF actually picks the wrong end-of-file marker.
The end-of-file marker %%EOF syntactically actually is merely a comment with a special meaning at the end of a PDF / revision, and comments are allowed nearly everywhere in a PDF outside PDF strings, PDF stream data, and PDF cross reference tables. Thus, the byte sequence %%EOF can occur as a regular comment or as a non-comment content of a string or stream any number of times between the signature value dictionary and the actual end of the signed revision.

If there is such an occurrence, veraPDF picks a byte sequence as an end-of-file marker which never has been meant as an end of something.

Furthermore, unless the actual end-of-file is reached in the loop (and pdfSource.length() / source.getStreamLength() is returned), the result appears to be off-by-one, the - 1 in return result - 1 does not correspond with the use of the result.

veraPDF versions

I checked against the current 1.5.0-SNAPSHOT versions of veraPDF which are tagged:

veraPDF-pdfbox-validation 1.5.4
veraPDF-validation 1.5.2
veraPDF-parser 1.5.1

The OP's sample document

The sample document provided by the OP has a LF after the end-of-file marker. Due to this and the off-by-one issue mentioned above, veraPDF determines a nominal signed byte ranges end which is two bytes short.

score 2 · Answer 2 · answered Apr 28 '17 at 07:31

2

As discussed above, we just released the hotfix for veraPDF 1.4 that addresses the issues in this discussion. The new version is available for download: http://downloads.verapdf.org/rel/1.4/verapdf-1.4.5-installer.zip

In particular, iText-signed PDF/A-2 documents seem to pass veraPDF validation just fine.

answered Apr 28 '17 at 07:31

Boris Doubrov

21
2

Great! Just a hint, though, how stack overflow is meant to be used: If you have an addendum to your existing answer (like in this case here), you usually edit it into that (there is an edit link right underneath it); you only create a new answer to present and elaborate on a completely different approach. As you see the answers are not simply sorted by date but also by a number of other criteria, among them acceptance and votes. Your multiple answers on the same approach, therefore, can too easily be separated by that sort order. – mkl Apr 28 '17 at 08:03
BTW, the `SignatureParser` fix looks different in 1.4.x and in 1.5.x, in 1.4.x it looks more complete, also taking a single CR into account, while not so in 1.5.x. – mkl Apr 28 '17 at 09:09

score 0 · Answer 3 · answered Apr 21 '17 at 19:51

0

I do agree with the analysis of how veraPDF checks the ByteRange at the moment. Indeed, it assumes the file terminates exactly at the %EOF marker immediately following the signature field.

The reason is quite simple. The document can be signed sequentially by several people, and can still be a valid PDF/A-2B document. When the second signature is generated, it will incrementally update the file containing the first signature.

So, if we interpret the term file in the PDF/A-2B requiments literally:

When computing the digest for the file, it shall be computed over the entire file, including the signature dictionary but excluding the PDF Signature itself. This range is then indicated by the ByteRange entry of the signature dictionary.

we wound never be able to create a valid PDF/A file with multiple signatures. This was clearly not an intention of the PDF/A-2 standard.

The PDF file is usually understood as a byte range between the leading %PDF to the trailing %EOF to allow, for example, for PDF files as a part of a bigger byte stream (eg., mail attachments). This is what veraPDF implementation is based on.

I do agree however that this approach does not take into account the optional end-of-line sequence after %EOF. I've created the corresponding issue for veraPDF: https://github.com/veraPDF/veraPDF-validation/issues/166

It leaves though an interesing question: what is a valid ByteRange of the first signature in case the document has more signatures? I believe, all cases:

ByteRange covers the file till the next following %EOF marker
ByteRange covers the file till the next following %EOF marker + a single CR character
ByteRange covers the file till the next following %EOF marker + a single LF character
ByteRange covers the file till the next following %EOF marker + a two-byte CR+LF sequence

should be allowed.

answered Apr 21 '17 at 19:51