2

Hello I want to convert a PDF file to text file. I am converting the PDF file to text file. But it doesn't preserves the format of text which is exactly in PDF file.

Please help me.

famousgarkin
  • 13,687
  • 5
  • 58
  • 74
MJ13
  • 21
  • 1
  • 1
  • 4

3 Answers3

4

A text file by itself cannot contain formatting.

You cannot preserve formatting in a plain text file because it only contains text. There could be HTML markup inside the text file, but then I would call this an HTML file. Otherwise, you should be trying to convert it into a rich text format (RTF), Microsoft Word, OpenOffice, or some other document type instead.

Erick Robertson
  • 32,125
  • 13
  • 69
  • 98
1

This can help you.

File f = new File(fileName);
        if (!f.isFile()) {  
            return null;  
        } 


        try {
            parser = new PDFParser(new FileInputStream(f));
        } catch (Exception e) {
            return null;
        }  

        try {
            parser.parse();
            cosDoc = parser.getDocument();  
            pdfStripper = new PDFTextStripper();
           /* pdfStripper.setStartPage(2); 
            pdfStripper.setEndPage(3);*/  
            pdDoc = new PDDocument(cosDoc);
            parsedText = pdfStripper.getText(pdDoc);
        } catch (Exception e) {  
            System.out.println("An exception occured in parsing the PDF Document.");  
            e.printStackTrace();  
            try {  
                   if (cosDoc != null) cosDoc.close();  
                   if (pdDoc != null) pdDoc.close();  
               } catch (Exception e1) {  
               e.printStackTrace();  
            }  
            return null;  
        }
Prasad
  • 1,188
  • 3
  • 11
  • 29
1

PDFBox will help you for this it may loose some formatting as Erick Robertson said

refer PDF Text Parser: Converting PDF to Text in Java using PDFBox

Hemant Metalia
  • 29,730
  • 18
  • 72
  • 91