0

I use iText to read a PDF document containing an XFA form. I convert it to XML, read data from the XML and insert it in a datatbase. But if I dont have an XFA form in the PDF then how I can efficiently read data from the PDF?

Joris Schellekens
  • 8,483
  • 2
  • 23
  • 54
hrishi
  • 1,610
  • 6
  • 26
  • 43

1 Answers1

0

It depends on your expectations.

  • You can use text extraction to retrieve all the text on a certain page. How you then process the text is up to you. (e.g. regular expressions)

  • You can also opt for using pdf2Data, an iText7 add-on that allows you to match documents against templates. pdf2Data seems like a good fit, since it produces XML files as its output.

More information on pdf2Data can be found here http://itextpdf.com/itext7/pdf2Data

Joris Schellekens
  • 8,483
  • 2
  • 23
  • 54
  • Text extraction is not much helpful as values can not be mapped – hrishi Aug 09 '17 at 09:39
  • It depends. You can use TextExtractionStrategies that take a specific location (Rectangle) as their input. This allows you a more targeted approach. Once you have the text at a certain (roughly defined) position, you can use regular expressions to further refine the result. – Joris Schellekens Aug 09 '17 at 09:40
  • ok. Thanks I will check it. I am not much familiar with PDFs. I use iText java code to read XFA forms. Can you share any sample code link where I can get idea on how to use it programmatically – hrishi Aug 09 '17 at 09:49
  • Sample code, both for pdf2Data and text extraction can be found on the website. Also, upvote my answer (or mark it as accepted) if it helped you. – Joris Schellekens Aug 09 '17 at 09:51