0

Have some existing c# code that parses iText produced PDF documents using iTextSharp. Our app supports the ability to import info into our system by PDF. I am new to PDF extraction.

A new version of the PDF document is now out but it is using RealObjects PDFreactor to produce the PDF documents. The pdf documents no longer import correctly. The extracted pages are not coming through in a nice top-down format but rather mixed up.

I am assuming that the issue is due to the pdf is now created using PDFreactor and not iText but I could be wrong?

Do I have no update my code to support PDFreactor in order for the importing to extract pages correctly?

Does the extractor add-in have to be the same as the tool used to create the pdf?

  • 1
    You don't show *how* you are extracting content, so how could we tell you how to improve your code? You actually even don't tell *what kind* of content you extract, text, bitmaps, vector graphics, a mix thereof... as a first guess, it sounds like you have to use a different text extraction strategy. For more details please show your pivotal extraction code and share an example PDF with actual and expected extraction results. – mkl Jul 28 '23 at 05:25
  • I'm not asking for code help. I am asking if I have to use the same tool to extract as was used to create the PDF. Originally created using iText, latest version is now created using PDFreactor. – Kevin J Jul 28 '23 at 20:52

0 Answers0