20

I am trying to find out if it is possible to read PDF Form data (Forms filled in and saved with the form) using iTextSharp. How can I do this?

DLeh
  • 23,806
  • 16
  • 84
  • 128
Bhuvan
  • 1,523
  • 4
  • 23
  • 49

6 Answers6

23

You would have to find out the field names in the PDF form. Get the fields and then read their value.

string pdfTemplate = "my.pdf";
PdfReader pdfReader = new PdfReader(pdfTemplate);
AcroFields fields = pdfReader.AcroFields.Fields;
string val = fields.GetField("fieldname");

Obviously in the code above, field name is the name of the PDF form field and the GetField method returns a string representation of that value. Here is an article with example code that you could probably use. It shows how you can both read and write form fields using iTextSharp.

RivieraKid
  • 5,923
  • 4
  • 38
  • 47
cecilphillip
  • 11,446
  • 4
  • 36
  • 40
  • this works like a charm... I wonder why I haven't looked into this function.. when I tried every other function :). Thanks a lot.. u saved my weekend. – Bhuvan Jul 30 '10 at 23:22
  • 1
    Public Service Announcement: The following code will not get you what you want: `pdfReader.AcroFields.Fields["fieldName"].Value` . I wasted a few hours before I found this post. – Walter Stabosz Jun 21 '13 at 13:55
  • 2
    Hi. I think there is an error in the third line of your example. The correct form would be: `AcroFields fields = pdfReader.AcroFields;` – cesAR Jul 18 '17 at 17:43
  • Why might a PDF list with zero fields, even though it does have fields? – Andrew Truckle Dec 09 '22 at 14:37
16

Maybe the iTextSharp library has changed recently but I wasn't able to get the accepted answer to work. Here is my solution:

var pdf_filename = "pdf2read.pdf";
using (var reader = new PdfReader(pdf_filename))
{
    var fields = reader.AcroFields.Fields;

    foreach (var key in fields.Keys)
    {
        var value = reader.AcroFields.GetField(key);
        Console.WriteLine(key + " : " + value);
    }
}

A very subtle difference, due to reader.AcroFields.Fields returning an IDictionary instead of just an AcroFields object.

Adam Jones
  • 2,370
  • 5
  • 23
  • 40
  • this works, but is really slow, takes over a minute to read ~3000 fields. Anyone know a faster way to enumerate these? I tried doing it in parallel but that didn't seem to help. – DLeh Oct 18 '16 at 04:47
  • I can't get it to work with the PDF I am using. Shows no fields. – Andrew Truckle Dec 09 '22 at 14:36
3

If you are using Powershell, the discovery code for fields is:

    Add-Type -Path C:\Users\Micah\Desktop\PDF_Test\itextsharp.dll
    $MyPDF = "C:\Users\Micah\Desktop\PDF_Test\something_important.pdf"
    $PDFDoc = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $MyPDF
    $PDFDoc.AcroFields.Fields

That code will give you the names of all the fields on the PDF Document, "something_important.pdf".

This is how you access each field once you know the name of the field:

    $PDFDoc.AcroFields.GetField("Name of the field here")
3

This worked for me! Note the parameters when defining stamper! '\0', true

string TempFilename = Path.GetTempFileName();

PdfReader pdfReader = new PdfReader(FileName);
//PdfStamper stamper = new PdfStamper(pdfReader, new FileStream(TempFilename, FileMode.Create));
PdfStamper stamper = new PdfStamper(pdfReader, new FileStream(TempFilename, FileMode.Create), '\0', true);

AcroFields fields = stamper.AcroFields;
AcroFields pdfFormFields = pdfReader.AcroFields;

foreach (KeyValuePair<string, AcroFields.Item> kvp in fields.Fields)
{
    string FieldValue = GetXMLNode(XMLFile, kvp.Key);
    if (FieldValue != "")
    {
        fields.SetField(kvp.Key, FieldValue);
    }
}

stamper.FormFlattening = false;
stamper.Close();
pdfReader.Close()
Andrew Truckle
  • 17,769
  • 16
  • 66
  • 164
Serg
  • 57
  • 2
  • The OP only wanted to *read PDF Form data* (and got a good answer for that). Your code shows how to *change PDF form data.* – mkl Sep 03 '13 at 08:10
  • Sorry, actually posted the answer into the wrong thread... This meant to be explanation on how push values into the fields and preserve form editing when the file is opened again... – Serg Sep 04 '13 at 20:02
2

The PDF name is "report.pdf"..

The data field to be read into TextBox1 is "TextField25" in the PDF..

        Dim pdf As String = "report.pdf"
        Dim reader As New PdfReader(pdf)
        Dim fields As AcroFields = reader.AcroFields
        TextBox1.Text = fields.GetField("TextField25")

Important Note: This can be used ONLY IF the PDF is not flattened (means the fields should be editable) while it was created using iTextSharp..

i.e.

       pdfStamper.FormFlattening = False

This is very simple.. And it works like a charm.. :)

EIV
  • 399
  • 3
  • 8
0

If anybody is still wondering about this answer, this is how I extracted the text in the field (provided you know the field name):

PdfReader reader = new("filepath");
PdfDocument doc = new(reader);
PdfAcroForm form = PdfAcroForm.GetAcroForm(document, false);

Form.GetField("FieldNameHere").GetValueAsString();

Works for iText 7.1.16

Misguided Chunk
  • 161
  • 1
  • 10