Hello I want to convert a PDF file to text file. I am converting the PDF file to text file. But it doesn't preserves the format of text which is exactly in PDF file.
Please help me.
Hello I want to convert a PDF file to text file. I am converting the PDF file to text file. But it doesn't preserves the format of text which is exactly in PDF file.
Please help me.
A text file by itself cannot contain formatting.
You cannot preserve formatting in a plain text file because it only contains text. There could be HTML markup inside the text file, but then I would call this an HTML file. Otherwise, you should be trying to convert it into a rich text format (RTF), Microsoft Word, OpenOffice, or some other document type instead.
This can help you.
File f = new File(fileName);
if (!f.isFile()) {
return null;
}
try {
parser = new PDFParser(new FileInputStream(f));
} catch (Exception e) {
return null;
}
try {
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new PDFTextStripper();
/* pdfStripper.setStartPage(2);
pdfStripper.setEndPage(3);*/
pdDoc = new PDDocument(cosDoc);
parsedText = pdfStripper.getText(pdDoc);
} catch (Exception e) {
System.out.println("An exception occured in parsing the PDF Document.");
e.printStackTrace();
try {
if (cosDoc != null) cosDoc.close();
if (pdDoc != null) pdDoc.close();
} catch (Exception e1) {
e.printStackTrace();
}
return null;
}
PDFBox will help you for this it may loose some formatting as Erick Robertson said
refer PDF Text Parser: Converting PDF to Text in Java using PDFBox