I am extracting text from a PDF and have an issue with the same text being returned from sequential pages. I have written a few PDF parsers using iTextSharper and have just ported the following code from iTextSharper to iText7 on the flawed assumption this was only an iTextSharper issue:
var pdfDocument = new PdfDocument(new PdfReader(@"C:\Temp\MyForm.pdf"));
for (int page = 1; page <= pdfDocument.GetNumberOfPages(); page++)
{
var strategy = new SimpleTextExtractionStrategy();
var pdfPage = pdfDocument.GetPage(page);
var currentText = PdfTextExtractor.GetTextFromPage(pdfPage, strategy);
// Process this page
Console.WriteLine("PAGE {0}", page);
Console.WriteLine(currentText);
}
Is there something I'm missing here?