1

I have a problem with getting text from a PDF.

I use library com.itextpdf in version 5.0.6

Link to PDF: http://sendfile.pl/pobierz/351149---hDwE.html

        try {
            PdfReader reader = new PdfReader("C:\\Users\\lukas\\Desktop\\test.pdf");
            int n = reader.getNumberOfPages(); // prints 1
            String str= PdfTextExtractor.getTextFromPage(reader, 1);
            System.out.println(str);
            reader.close();
        }
        catch (Exception e) {
            System.out.println(e);
        }

Variable str have only specific square.

lukas rr
  • 11
  • 1
  • Most likely the text compressing method is not supported by the library. I had the same problem with pdfbox –  May 14 '15 at 16:41
  • IText 5.0.6 is ancient. The text extraction feature back then essentially was merely a proof-of-concept. Please update to a current 5.5.x version and try again. – mkl May 14 '15 at 20:09
  • I just inspected the provided sample PDF. While my advice to update is still a good one, it won't help here. The PDF simply does not contain the information required for PDF text extraction by means as described in the PDF specification. A good first test usually is trying to copy&paste the text from Adobe Reader, and this test fails here, too. – mkl May 14 '15 at 21:09

0 Answers0