3

I have prepared one editable form But unable to convert pdf editable fields into text using java programming.

Used API – pdfbox-app-2.0.0-RC2, PDFBox-0.7.3, itextpdf-5.1.0, pdfclown.

Pleas help me to find out how to convert pdf editable fields into text in java.

used java program (able to convert normal pdf into text but not converting pdf editable fields into text ).

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.BufferedWriter;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import java.awt.Desktop;
import javax.swing.filechooser.FileNameExtensionFilter;
import javax.swing.JFileChooser;

public class PdfConvertor_1{
 public static void main(String[] args){
  selectPDFFiles();
 }


 //allow pdf files selection for converting
 public static void selectPDFFiles(){

  JFileChooser chooser = new JFileChooser();
      FileNameExtensionFilter filter = new FileNameExtensionFilter("PDF","pdf");
      chooser.setFileFilter(filter);
      chooser.setMultiSelectionEnabled(true);
      int returnVal = chooser.showOpenDialog(null);
      if(returnVal == JFileChooser.APPROVE_OPTION) {
               File[] Files=chooser.getSelectedFiles();
               System.out.println("Please wait...");
               for( int i=0;i<Files.length;i++){     
                convertPDFToText(Files[i].toString(),"textfrompdf"+i+".txt");
                }
   System.out.println("Conversion complete");
                }

  }

 public static void convertPDFToText(String src,String desc){
  try{
   //create file writer
   FileWriter fw=new FileWriter("D:\\POC_Pdf2.txt");
   //create buffered writer
   BufferedWriter bw=new BufferedWriter(fw);
   //create pdf reader
   PdfReader pr=new PdfReader(src);
   //get the number of pages in the document
   int pNum=pr.getNumberOfPages();
   //extract text from each page and write it to the output text file
   for(int page=1;page<=pNum;page++){
    String text=PdfTextExtractor.getTextFromPage(pr, page);
    bw.write(text);
    bw.newLine();

   }
   bw.flush();
   bw.close();



  }catch(Exception e){e.printStackTrace();}

 }

}

Please check editable fields in image which i want to convert in to text using java

Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165
Milan
  • 31
  • 3
  • So essentially you want to read the values entered into the corn fields? – mkl Dec 22 '15 at 19:51
  • i am not sure whether we call that field as com field or not. i have attached one image please check the editable field and kindly suggest me. Thanks. – Milan Dec 23 '15 at 06:59
  • @Bruno's answer shows how to read field values using iText. It is comparably easy with PDFBox or PDFClown. I wonder, though, why you used so old releases of iText and PDFBox. Current iText release is 5.5.8 and current PDFBox release (before the upcoming 2.0.0) is 1.8.10. – mkl Dec 23 '15 at 08:36
  • pdfbox-app-2.0.0-RC2 *and* PDFBox-0.7.3 ? Very weird. Although I don't see any pdfbox code. – Tilman Hausherr Dec 23 '15 at 13:37

1 Answers1

0

Fields are not part of the page content stream, hence "getting text from a page" won't give you the value of a field.

You need to get the form from the PDF. A form is referred to from the root dictionary of a PDF, but there's a convenience method to get an AcroFields object. This question was already answered for people who are using iTextSharp / C#: How to read PDF form data using iTextSharp?

PdfReader reader = new PdfReader(path_to_your_completed_form);
AcroFields fields = reader.getAcroFields();
String value = fields.getField(key);

In this snippet, path_to_your_completed_form is the full path you get from your JFileChooser and key is the value of one of the fields that is defined in your form.

If you don't know which fields are defined in your form, please read the answer to the question How to get specific types from AcroFields? Like PushButtonField, RadioCheckField, etc? There's some code in that example that allows you to loop over the available fields and that informs you if a field is a text field, a check box, a radio button, and so on.

Community
  • 1
  • 1
Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165