I use iText to read a PDF document containing an XFA form. I convert it to XML, read data from the XML and insert it in a datatbase. But if I dont have an XFA form in the PDF then how I can efficiently read data from the PDF?
Asked
Active
Viewed 295 times
1 Answers
0
It depends on your expectations.
You can use text extraction to retrieve all the text on a certain page. How you then process the text is up to you. (e.g. regular expressions)
You can also opt for using pdf2Data, an iText7 add-on that allows you to match documents against templates. pdf2Data seems like a good fit, since it produces XML files as its output.
More information on pdf2Data can be found here http://itextpdf.com/itext7/pdf2Data

Joris Schellekens
- 8,483
- 2
- 23
- 54
-
Text extraction is not much helpful as values can not be mapped – hrishi Aug 09 '17 at 09:39
-
It depends. You can use TextExtractionStrategies that take a specific location (Rectangle) as their input. This allows you a more targeted approach. Once you have the text at a certain (roughly defined) position, you can use regular expressions to further refine the result. – Joris Schellekens Aug 09 '17 at 09:40
-
ok. Thanks I will check it. I am not much familiar with PDFs. I use iText java code to read XFA forms. Can you share any sample code link where I can get idea on how to use it programmatically – hrishi Aug 09 '17 at 09:49
-
Sample code, both for pdf2Data and text extraction can be found on the website. Also, upvote my answer (or mark it as accepted) if it helped you. – Joris Schellekens Aug 09 '17 at 09:51