Text extraction from PDF using PDFBox 2.0.2 missing class PDFTextStripper()

Question

I've implemented simple text extraction method using PDFBox 1.8.10 in java. Cause of some reasons i have to upgrade library to PDFBox 2.0.2. Probably PDFTextStripper() method is removed or located another package in new version. Is there any way to get through this problem? Or can you suggest another way to get text from PDF?

Here is my code:

public String extractTextFromPdf() {
     File jInputFile = new File("c:/lorem/ipsum.pdf");
     PDDocument PDDoc = PDDocument.load(jInputFile ); 
     String strContent = new PDFTextStripper().getText(PDDoc);
     PDDoc.close();
     return strContent;
}

Thanks in advance.

What IDE are you using? In Netbeans, press Ctrl-Shift-I, and the import will be fixed automatically. In eclipse, press Ctrl-Shift-O. — Tilman Hausherr, Aug 01 '16 at 10:11
@TilmanHausherr Thanks man. I am using eclipse. After restarting it's been fixed. I think it is an temporary error. PDFBox moved PDFTextStripper class from 'org.apache.pdfbox.util' to 'org.apache.pdfbox.text' package. What a development... — bcakmak, Aug 01 '16 at 11:02
Glad it works. Please delete your question, as this is something rather trivial. Or answer it yourself. — Tilman Hausherr, Aug 01 '16 at 11:04

score 0 · Answer 1 · edited Aug 01 '16 at 10:44

0

try this it

{
    PDDocument document = null;
    document = PDDocument.load(new File("test.pdf"));
    document.getClass();
    if (!document.isEncrypted()) {
        PDFTextStripperByArea stripper = new PDFTextStripperByArea();
        stripper.setSortByPosition(true);
        PDFTextStripper Tstripper = new PDFTextStripper();
        String st = Tstripper.getText(document);
        System.out.println("Text:" + st);
    }
} catch (Exception e) {
    e.printStackTrace();
}`

edited Aug 01 '16 at 10:44

rdonuk

3,921
21
39

answered Aug 01 '16 at 09:35

SerefAltindal

339
3
12

1

This is not an answer to the question. Additionally, `document.getClass();` has no effect. `if (!document.isEncrypted())` is not needed. – Tilman Hausherr Aug 01 '16 at 10:33

Text extraction from PDF using PDFBox 2.0.2 missing class PDFTextStripper()

1 Answers1