I'm working in a servlet file for a web project and this is my code :
I have the v.2.0.0 of pdfbox library and my code works in a simple java application
pdfmanager.java :
public class pdfManager {
private PDFParser parser;
private PDFTextStripper pdfStripper;
private PDDocument pdDoc ;
private COSDocument cosDoc ;
private String Text ;
private String filePath;
private File file;
public pdfManager() {
}
public String ToText() throws IOException
{
this.pdfStripper = null;
this.pdDoc = null;
this.cosDoc = null;
file = new File(filePath);
parser = new PDFParser(new RandomAccessFile(file,"r")); // update for PDFBox V 2.0
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new PDFTextStripper();
pdDoc = new PDDocument(cosDoc);
pdDoc.getNumberOfPages();
pdfStripper.setStartPage(1);
pdfStripper.setEndPage(10);
// reading text from page 1 to 10
// if you want to get text from full pdf file use this code
// pdfStripper.setEndPage(pdDoc.getNumberOfPages());
Text = pdfStripper.getText(pdDoc);
return Text;
}
public void setFilePath(String filePath) {
this.filePath = filePath;
}
}
the srvlet file :
PrintWriter out = response.getWriter() ;
out.println("\ndata we gottoo : ") ;
pdfManager pdfManager = new pdfManager();
pdfManager.setFilePath("/Users/rami/Desktop/pdf2.pdf");
System.out.println(pdfManager.ToText());
called in doGet method