PDF contains text, but ITextPDF dont see it

Asked May 14 '15 at 16:32

Active May 14 '15 at 16:32

Viewed 80 times

I have a problem with getting text from a PDF.

I use library com.itextpdf in version 5.0.6

Link to PDF: http://sendfile.pl/pobierz/351149---hDwE.html

        try {
            PdfReader reader = new PdfReader("C:\\Users\\lukas\\Desktop\\test.pdf");
            int n = reader.getNumberOfPages(); // prints 1
            String str= PdfTextExtractor.getTextFromPage(reader, 1);
            System.out.println(str);
            reader.close();
        }
        catch (Exception e) {
            System.out.println(e);
        }

Variable str have only specific square.

asked May 14 '15 at 16:32

lukas rr

Most likely the text compressing method is not supported by the library. I had the same problem with pdfbox – May 14 '15 at 16:41
IText 5.0.6 is ancient. The text extraction feature back then essentially was merely a proof-of-concept. Please update to a current 5.5.x version and try again. – mkl May 14 '15 at 20:09
I just inspected the provided sample PDF. While my advice to update is still a good one, it won't help here. The PDF simply does not contain the information required for PDF text extraction by means as described in the PDF specification. A good first test usually is trying to copy&paste the text from Adobe Reader, and this test fails here, too. – mkl May 14 '15 at 21:09

PDF contains text, but ITextPDF dont see it

0 Answers0