4

I have a signed PDF. The signature covers the entire documents and it's valid.

I want to extract the original pdf to compare its hash with that of the unsigned pdf.

I extract original pdf using the following code:

PdfReader reader = new PdfReader(FILESIGNED);
AcroFields acrofields = reader.getAcroFields();
//pdf have a unique signature
String signatureName = acrofields.getSignatureNames().get(0); 
FileOutputStream os = new FileOutputStream(FILEORIGINAL);
InputStream ip = acrofields.extractRevision(signatureName);
int n = 0;
byte bb[] = new byte[1028];
while ((n = ip.read(bb)) > 0)
    os.write(bb, 0, n);
os.close();
ip.close();
reader.close();

But the extracted pdf is not the same as the original. I would extract revision before signature? Is it possible?

Thanks for help. Sara

Joris Schellekens
  • 8,483
  • 2
  • 23
  • 54
Sara
  • 125
  • 1
  • 7

1 Answers1

6

I want to extract the original pdf to compare its hash with that of the unsigned pdf.

In general this is not possible.

When iText (or other PDF signing libraries or applications) sign a PDF, they:

  1. add a signature form field to the PDF (unless an empty signature form field exists and is chosen for use in signing);
  2. add a dictionary object to the PDF with some signing related entries, in particular a big placeholder entry into which eventually a CMS signature container will be inserted; this dictionary is set as the value of the aforementioned form field;
  3. add a visualization to the form field, often containing some data from the signer certificate (unless the signature is chosen to be invisible);
  4. make some other form fields read-only if an empty signature for field with field lock information is signed;
  5. finalize the PDF, i.e. they set metadata like time-of-last-change and then write the finished PDF into a file or some byte array;
  6. calculate the hash value of the finished PDF excluding the value of the big placeholder but including all other changes made as described above;
  7. sign this hash value resulting in a CMS signature container;
  8. and put this signature container into the big placeholder.

Thus, in general the "original pdf" cannot be extracted anymore from the signed PDF file because the changes described above may have fundamentally changed the internal structure of the PDF.

There is one exception, though: If those changes were applied as an incremental update (in iText lingo: in append mode), it usually is possible to retrieve the original by cutting off that incremental update.

For this one merely has to search the latest end-of-file marker before the signature and cut off thereafter. (Actually there is a small amount of insecurity, a final end-of-line marker may or may not be part of the original PDF.)

mkl
  • 90,588
  • 15
  • 125
  • 265
  • 0 down vote accept Thanks for help. I have another request. I have read this ticket: Get the original content from a signed pdf If I have one signature, it's not possible to give the original pdf. But if i have more signature (revision), from revision n.2 it's possibile get revision n.1. My question is: it's possibile to create a first revision in a pdf (for example adding a hiding comment), and after adding signature. So it will be easier to get previous revision of document. Thanks for help – Sara Sep 04 '17 at 08:34
  • @Sara To be able to get the original version, already the first signature must be applied as an incremental update; as you're using iText that means *using append mode*. (You need to use the `PdfStamper.createSignature` overload with a `boolean` parameter and set that parameter to `true`.) If you don't want the *small amount of insecurity* I mentioned in my answer, you can add document metadata which contain the length of the original version, so there is no insecurity whether or not a trailing end-of-line is part of the original. – mkl Sep 04 '17 at 09:48
  • We always signed with createSignature(reader, signedPDF, '\0', tempFile, true). But only signature create a revision in pdf ? Or there are other changes that create revision? – Sara Sep 04 '17 at 10:12
  • @Sara *"But only signature create a revision in pdf ?"* - PDF does not know an explicit concept of a *revision*. Some software products *interpret* a concept of a revision into certain structures in the PDF. The most clear such structures are **A** a *signed revision* because the signed byte ranges imply the size of it, and **B** the *current revision* which simply is the whole file. iText has code to extract type **A** revisions, and for the type **B** revision nothing needs to be done for extracting it. – mkl Sep 04 '17 at 11:26
  • @Sara If you want to extract other kinds of revisions, things become somewhat vague: as mentioned in my answer an end-of-line marker after the end-of-file marker might or might not have been part of the original version. Furthermore, the end-of-line marker syntactically merely is a PDF comment object. Some jokester, therefore, might have scattered such comment objects all over the PDF and you have to try and find out which of them merely are comments to ignore and which ones actually once denoted the end of a PDF... – mkl Sep 04 '17 at 11:31
  • One option... if you still have a copy of the unsigned pdf, you in particular have its size. In that case you can simply check whether that exact number of initial bytes of the signed pdf coincide with your unsigned file, either by direct comparison or via hashes. – mkl Sep 05 '17 at 07:55
  • @mkl thanks for answer, I have one signature in pdf file. I need to convert the file to the original. I use `createSignature(reader, signedPDF, '\0', null, true)` and `fields.extractRevision(sigName)` but it's doesn't work! – Максим Казаченко Nov 06 '19 at 13:35
  • 1
    @МаксимКазаченко you apparently misinterpret `fields.extractRevision(sigName)`. That call does extract the revision in which that signature field was signed, not the revision before add you appear to assume. – mkl Nov 06 '19 at 15:34
  • In your case you should proceed as described in the final two paragraphs of my answer, *"There is one exception..."* and *"For this one merely has to..."*. – mkl Nov 06 '19 at 15:37
  • @mkl thanks my friend! I just cut the bytes with the signature to return the original and everything works. Does the code look normal? – Максим Казаченко Nov 08 '19 at 11:06
  • `AcroFields fields = reader.getAcroFields(); String signatureName = fields.getSignatureNames().get(0); PdfDictionary sigDict = fields.getSignatureDictionary(signatureName); PdfArray rangeSignature = sigDict.getAsArray(PdfName.BYTERANGE); byte[] sourceBytes = result.getContent(); byte[] firstPartArray = Arrays.copyOfRange(sourceBytes, 0, (int) rangeSignature.asLongArray()[1]); byte[] secondPartArray = Arrays.copyOfRange(sourceBytes, (int) rangeSignature.asLongArray()[2], sourceBytes.length); byte[] lastArray = ArrayUtils.addAll(firstPartArray, secondPartArray); return lastArray; ` – Максим Казаченко Nov 08 '19 at 11:10
  • @МаксимКазаченко *"Does the code look normal?"* - Actually code spanning multiple lines in a comment always looks bad; if you have a question, please make it an actual stack overflow question, not a comment here. That being said, it looks like your code determines the bytes from the signed PDF which are signed. This is *not* the original which according to your first comment here are looking for. – mkl Nov 14 '19 at 11:52