4

I've implemented simple text extraction method using PDFBox 1.8.10 in java. Cause of some reasons i have to upgrade library to PDFBox 2.0.2. Probably PDFTextStripper() method is removed or located another package in new version. Is there any way to get through this problem? Or can you suggest another way to get text from PDF?

Here is my code:

public String extractTextFromPdf() {
     File jInputFile = new File("c:/lorem/ipsum.pdf");
     PDDocument PDDoc = PDDocument.load(jInputFile ); 
     String strContent = new PDFTextStripper().getText(PDDoc);
     PDDoc.close();
     return strContent;
}

Thanks in advance.

bcakmak
  • 41
  • 2
  • What IDE are you using? In Netbeans, press Ctrl-Shift-I, and the import will be fixed automatically. In eclipse, press Ctrl-Shift-O. – Tilman Hausherr Aug 01 '16 at 10:11
  • 2
    @TilmanHausherr Thanks man. I am using eclipse. After restarting it's been fixed. I think it is an temporary error. PDFBox moved PDFTextStripper class from 'org.apache.pdfbox.util' to 'org.apache.pdfbox.text' package. What a development... – bcakmak Aug 01 '16 at 11:02
  • Glad it works. Please delete your question, as this is something rather trivial. Or answer it yourself. – Tilman Hausherr Aug 01 '16 at 11:04

1 Answers1

0

try this it

{
    PDDocument document = null;
    document = PDDocument.load(new File("test.pdf"));
    document.getClass();
    if (!document.isEncrypted()) {
        PDFTextStripperByArea stripper = new PDFTextStripperByArea();
        stripper.setSortByPosition(true);
        PDFTextStripper Tstripper = new PDFTextStripper();
        String st = Tstripper.getText(document);
        System.out.println("Text:" + st);
    }
} catch (Exception e) {
    e.printStackTrace();
}`
rdonuk
  • 3,921
  • 21
  • 39
SerefAltindal
  • 339
  • 3
  • 12
  • 1
    This is not an answer to the question. Additionally, `document.getClass();` has no effect. `if (!document.isEncrypted())` is not needed. – Tilman Hausherr Aug 01 '16 at 10:33