2

How is it possible to get signature Value from signed PDF file? I can get all other data from signature except its value. Is there any way to get it in C#?

PdfPKCS7 pk;
PdfReader reader = new PdfReader(PdfFilename);
AcroFields af = reader.AcroFields;

var names = af.GetSignatureNames();
foreach (string name in names)
{
    pk = af.VerifySignature(name);

    var CN_signer = iTextSharp.text.pdf.security.CertificateInfo.GetSubjectFields(pk.SigningCertificate).GetField("CN");
    var C_signer = iTextSharp.text.pdf.security.CertificateInfo.GetSubjectFields(pk.SigningCertificate).GetField("C");
    var CN_issuer = iTextSharp.text.pdf.security.CertificateInfo.GetIssuerFields(pk.SigningCertificate).GetField("CN");
    var OU_issuer = iTextSharp.text.pdf.security.CertificateInfo.GetIssuerFields(pk.SigningCertificate).GetField("OU");
    var O_issuer= iTextSharp.text.pdf.security.CertificateInfo.GetIssuerFields(pk.SigningCertificate).GetField("O");
    var C_issuer = iTextSharp.text.pdf.security.CertificateInfo.GetIssuerFields(pk.SigningCertificate).GetField("C");
    var nr_serial = pk.SigningCertificate.SerialNumber;
    var date = pk.SignDate.ToString();

enter image description here

Artjom B.
  • 61,146
  • 24
  • 125
  • 222
Sara
  • 89
  • 2
  • 8
  • 1
    By signature value you mean the PKCS#7/CMS signature container? – mkl May 07 '15 at 09:31
  • Yes, I mean to read signature in base64 from a signed pdf file – Sara May 07 '15 at 12:12
  • @Sara Your comment is unclear. Signatures aren't stored in base64 form. Please explain in your own words what you mean when you say "signature value". In the context of ISO-32000, the signature value is the value of the `/V` entry in a signature field. That value is a PDF dictionary. It is very easy to get that dictionary. If you confirm that your interpretation of "signature value" is identical to the signature value as described in ISO-32000, an example can be provided. – Bruno Lowagie May 07 '15 at 12:16
  • I don't have it so clear how signature is saved in pdf or what does it change in pdf structure, but with signature value I mean bytes stored in the /Contents entry of PDF. Maybe it is like a value in hexadecimal – Sara May 07 '15 at 12:34
  • @Bruno Can you help me to get PDF dictionary, cause now that I read ISO-32000 file, i think the value of the /v entry is what I am looking for. – Sara May 07 '15 at 19:28

1 Answers1

8

The OP clarified that the signature Value was meant to refer to the PKCS#7/CMS signature container. The following sample method can do just that:

public void showSignatureValues(PdfReader reader)
{
    AcroFields fields = reader.AcroFields;
    foreach (String name in fields.GetSignatureNames())
    {
        Console.Write(" Signature {0}\n", name);

        PdfDictionary sigDict = fields.GetSignatureDictionary(name);
        PdfName subFilter = sigDict.GetAsName(PdfName.SUBFILTER);
        Console.Write("  SubFilter {0}\n", subFilter);

        PdfString contents = sigDict.GetAsString(PdfName.CONTENTS);
        if (contents != null)
        {
            byte[] contentBytes = contents.GetOriginalBytes();
            string contentBas64 = Convert.ToBase64String(contentBytes);
            // contentBytes contains the actual signature container as is,
            // contentBas64 contains it encoded using Base64 for better printability
            Console.Write("  Content {0}\n", contentBas64);
        }
    }
}

One remark, though: You will find that the contentBytes usually contains numerous 00 bytes after the signature container bytes (in the Base64 representation they show as a long string of letters A). This is because very often a generous estimate concerning the signature container size is made when preparing a PDF for signing, and more than enough space is reserved for the injection of it.

According to the specification, since the length of PKCS#7 objects is not entirely predictable, the value of Contents shall be padded with zeros at the end.

Using an ASN.1 parser you can determine how long the actual signature container byte sequence is and where the padding starts.

In theory the value of Contents shall be a DER-encoded PKCS#7 binary data object; as DER encoding rules do not allow the indefinite-length method, the size of the signature container should be determinable according to the leading first few bytes. Unfortunately there are numerous PDFs in the wild which contain the outer layers of the signature container merely BER encoded and only certain inner objects DER encoded. Thus, complete parsing can be required.


Afterthoughts

In the answer above I claimed bluntly that the sample code returns a PKCS#7/CMS signature container. Actually it is such a signature container only in most cases, it depends on the SubFilter of the signature field value.

Let's look at the SubFilter values defined in ISO 32000-1 (the PDF specification) and in the ETSI Technical Specification 102778 parts (PAdES):

  • adbe.x509.rsa_sha1 ISO 32000-1 - In this case the contents actually are a DER-encoded PKCS#1 binary data object. This is the case depicted in the OP's graphic

    The OP here calls the contents an encrypted digest which is only part of the truth because

    1. the PKCS#1 data object is constructed not from the bare digest but from a structure containing both that digest and the OID of the digest algorithm, and

    2. depending on the signature algorithm this structure may not be encrypted (as something that can be decrypted back to the digest again) but instead only a number may be derived from it which cannot be decrypted back to the structure but merely tested against an alleged document digest structure.

    This format nowadays hardly is in use anymore.

  • adbe.pkcs7.detached ISO 32000-1, ETSI TS 102778-2 - The contents are a DER-encoded PKCS#7 binary data object signing the byte range directly, i.e. normally the byte range digest is in the signed attribute MessageDigest.

  • adbe.pkcs7.sha1 ISO 32000-1, ETSI TS 102778-2 - The contents are a DER-encoded PKCS#7 binary data object signing the byte range indirectly, i.e. the byte range SHA1 digest is put into the container as data which in turn is signed normally.

  • ETSI.CAdES.detached ETSI TS 102778-3 - The contents are a DER-encoded SignedData object as specified in CMS signing the byte range directly, essentially this is a specially profiled variant of adbe.pkcs7.detached.

  • ETSI.RFC3161 ETSI TS 102778-4 - The contents are a TimeStampToken as specified in RFC 3161 stamping the byte range directly; this is a time stamp format closely related to PKCS#7. (This is a special case as the form field type is not Sig but DocTimeStamp.)

Only in case of adbe.x509.rsa_sha1 the certificates involved are included in separate signature dictionary entries. In all other cases certificates (and other security related material) are included in the SignedData structure in the Contents.

mkl
  • 90,588
  • 15
  • 125
  • 265
  • 1
    Thanks @mkl I was working on the release of iText 5.5.6 and I didn't have the time to answer this question. Your answer is more complete than mine would have been :D – Bruno Lowagie May 08 '15 at 09:44
  • Thank you @mkl for explanation, I just have another question! Is this code you have written supposed to show only this part of signature dictionary (part signed with red in picture above) or it is showing more, because as I can see when I'm trying in my code it is including certificate and some other data too – Sara May 08 '15 at 20:45
  • Your image uses the signature type (subfilter adbe.x509.rsa_sha1) in which the contents only contain a naked pkcs#1 signature and other data are represented in other fields. Nowadays predominantly in use and considered interoperable, though, only are other types in which the contents contain full-fledged pkcs#7/CMS signature containers in which also certificates and many other data are embedded. Do yourself a favor and don't spend too much time on exotic special cases like the one in your image. – mkl May 08 '15 at 21:41
  • So does it mean that it is not possible to get only the encrypted digest from content of PDF without certificate and timestamp data? – Sara May 12 '15 at 09:56
  • *does it mean that it is not possible to get only the encrypted digest from content of PDF without certificate and timestamp data* - First of all, when I asked whether by signature value you meant the PKCS#7/CMS signature container, you said *Yes*. Thus, the answer focused on retrieving that signature container, not some naked encrypted digest. That been said, of course you can access some encrypted digest value from inside the signature container. You simply use some PKCS#7/CMS parser classes, e.g. provided by Bouncy Castle. But beware, ... – mkl May 12 '15 at 11:47
  • But beware, that encrypted digest is not the digest of the signed PDF byte ranges but of some signed attributes one of which is that digest. You might want to read [this answer to "Obtaining the hash/digest from a PCKS7 signed PDF file with iText"](http://stackoverflow.com/a/29969592/1729265) and [this answer to "Message digest of pdf in digital signature"](http://stackoverflow.com/a/28429984/1729265). – mkl May 12 '15 at 11:50