3

I would like to know how I could get the original content from a signed pdf document using iText java library or another one.

Thanks

UPDATE 1:

Possible example:

PdfReader reader = new PdfReader(PATH_TO_PDF);
AcroFields fields = reader.getAcroFields();
ArrayList<String> signatures = fields.getSignatureNames();
for (String signature : signatures)
{
    // Start revision extraction
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    byte bb[] = new byte[8192];
    InputStream ip = fields.extractRevision(signature);
    int n = 0;
    while ((n = ip.read(bb)) > 0)
        out.write(bb, 0, n);
    out.close();
    ip.close();
    MessageDigest md = MessageDigest.getInstance("SHA1");
    byte[] resum = md.digest(out.toByteArray());
    // End revision extraction        
}

Note 1: In this example all signs are achieved when multiple signs.

Note 2: But the hash is not equal to the original hash document (the unsigned document)

Eduardo
  • 1,169
  • 5
  • 21
  • 56

1 Answers1

5

Please take a look at the following image:

enter image description here

In this case, you have a PDF file (starting with %PDF-1. and ending with %%EOF) and the digital signature is part of the document itself. It is the value of the /Contents key in the signature dictionary, that is in turn the value of the /V entry in the signature field dictionary.

It is not possible to get the original PDF as it once was, because the original PDF was altered: objects were renumbered, a signature field was either added or "filled out" by adding a signature dictionary.

You can remove the signature, but that won't give you the original PDF file.

PdfReader reader = new PdfReader(SIGNED); 
AcroFields acroFields = reader.getAcroFields(); 
acroFields.removeField("sig"); 
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(UNSIGNED)); 
stamper.close(); 
reader.close(); 

In this case, SIGNED is the path to a file with a signature named "sig". We remove the complete signature (including the signature field). The path to the resulting file is UNSIGNED and that's a file in which there is no longer trace of the signature field "sig". This is no longer the original PDF that was signed.

Now look at the following image:

enter image description here

This shows a PDF with three signatures. The first signature was added the way I previously described: you can no longer get the original document.

However, the second and third signature were added in append mode. This is the only way to add extra signatures because altering revision 1 would break the first signature.

If you have revision 3 (marked Rev3), it is very easy to retrieve revision 1 and 3 (Rev1 and Rev2). This is shown in the Signatures example:

PdfReader reader = new PdfReader(SIGNED);
AcroFields af = reader.getAcroFields();
FileOutputStream os = new FileOutputStream(REVISION);
byte bb[] = new byte[1028];
InputStream ip = af.extractRevision("first");
int n = 0;
while ((n = ip.read(bb)) > 0)
    os.write(bb, 0, n);
os.close();
ip.close();

In this example "first" is the name of the signature field, SIGNED is the path to the file with the signature and REVISION is the path to the revision that results from this operation.

Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165
  • Thanks. So if I've understood your explanation, it is not possible to achieve the original content because the first sign is not added in append mode. – Eduardo Apr 13 '15 at 12:11
  • In fact, what I really need is the hash of the original document that has been signed. Is there any way to achieve it from the signed document? Thanks – Eduardo Apr 13 '15 at 12:13
  • Not of the original document; however, you can get the hash of the blue part in the image. If it weren't possible to get the original hash that was signed, it wouldn't be possible to verify the signature. Why would you need the hash of the original document? Why isn't the hash that was signed sufficient? – Bruno Lowagie Apr 13 '15 at 12:34
  • @Eduardo *the hash of the original document that has been signed* - please check whether already that first signature has been applied in append-mode. Numerous signing services do so. In that case you can deduce the original PDF (or at least a few candidates one of which likely is the original) and, therefore, its hash. – mkl Apr 13 '15 at 12:49
  • @BrunoLowagie I want to verify the signature. I know that calling "fields.verifySignature(signame)" I'm verifying the signature, but I want to check if the hash of that signature matches the hash of a document that I've previously stored in database – Eduardo Apr 13 '15 at 13:48
  • The hash of the document *before* it was signed or *after* it was signed? If *before*, then you can only hope that the PDF was signed in append mode. – Bruno Lowagie Apr 13 '15 at 14:09
  • @BrunoLowagie Before. How can I know if pdf was signed it append mode? And then, how could I obtain the hash? Thanks – Eduardo Apr 13 '15 at 14:40
  • I've found an example of how I can achieve the hash, although I'm not sure if it is the best way or if it always works. I'm going to update my question with this example. – Eduardo Apr 13 '15 at 14:42
  • @mkl Thanks for your help, I've seen your reply until know. I don't know how I could see if the first signature has been applied in append-mode. And in that case, how can I deduce the original PDF? Thanks – Eduardo Apr 13 '15 at 15:24
  • @Eduardo *I want to check if the hash of that signature matches the hash of a document that I've previously stored in database* - **It won't!** Technically the original, unsigned document itself is not signed as is; instead this original file is prepared for signing by adding some structures and information, and this extended version of your original file is signed. – mkl Apr 13 '15 at 15:25
  • @mkl Ok, I've understood now although I don't like it at all. My web application is signing a pdf document with an applet. The signed document is being uploaded to the server. And I want to verify the signed document against the original document. How could I do that if the extended version is signed instead of the original one? – Eduardo Apr 13 '15 at 15:45
  • *How could I do that if the extended version is signed instead of the original one?* - that's not trivial. Essentially you have to compare the object structures of both pdfs and check whether the differences only are of a type used for signature embedding. Some sort Cuts are possible if you always sign in append mode. – mkl Apr 13 '15 at 20:52
  • @Eduardo Does signing have to be completely implemented on client-side? Instead you could prepare the PDF for signing on the server, calculate the hash of the byte ranges there, transfer the hash to the client, generate a normal CMS signature using that hash there, return the CMS signature to the server and embed it into the prepared PDF there. In that case you can be sure that the correct document is signed. – mkl Apr 14 '15 at 10:06
  • @mkl Thanks. Perhaps I could do as you're suggesting me... But I'm not very sure how I can prepare the PDF for signing on the server, and then, after signing the hash of the prepared PDF in client, how to embed it into the prepared PDF. – Eduardo Apr 14 '15 at 12:00
  • You might want to study "DIGITAL SIGNATURES FOR PDF DOCUMENTS", a whitepaper by Bruno Lowagie which you can retrieve [here](http://pages.itextpdf.com/ebook-digital-signatures-for-pdf.html). It contains a chapter on distributed client-server use cases. – mkl Apr 14 '15 at 12:08