The idea is next,
user selects a pdf file, and then this file converted into an image and such an image is displayed in the application.
In the image the user can choose positions that wants to read from a pdf file, and when the finish with selection position in the background program reads the original pdf and text stored in a txt file.
It is important that the resulting image from pdf file is the same size as himself pdf file
The next code convert pdf to image. I use pdfrenderer-0.9.1.jar
import java.awt.Rectangle;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import javax.imageio.ImageIO;
import com.sun.pdfview.PDFFile;
import com.sun.pdfview.PDFPage;
public class Pdf2Image {
public static void main(String[] args) {
File file = new File("E:\\invoice-template-1.pdf");
RandomAccessFile raf;
try {
raf = new RandomAccessFile(file, "r");
FileChannel channel = raf.getChannel();
ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
PDFFile pdffile = new PDFFile(buf);
// draw the first page to an image
int num=pdffile.getNumPages();
for(int i=0;i<num;i++)
{
PDFPage page = pdffile.getPage(i);
//get the width and height for the doc at the default zoom
int width=(int)page.getBBox().getWidth();
int height=(int)page.getBBox().getHeight();
Rectangle rect = new Rectangle(0,0,width,height);
int rotation=page.getRotation();
Rectangle rect1=rect;
if(rotation==90 || rotation==270)
rect1=new Rectangle(0,0,rect.height,rect.width);
//generate the image
BufferedImage img = (BufferedImage)page.getImage(
rect.width, rect.height, //width & height
rect1, // clip rect
null, // null for the ImageObserver
true, // fill background with white
true // block until drawing is done
);
ImageIO.write(img, "png", new File("E:/invoice-template-"+i+".png"));
}
}
catch (FileNotFoundException e1) {
System.err.println(e1.getLocalizedMessage());
} catch (IOException e) {
System.err.println(e.getLocalizedMessage());
}
}
}
Then the image is displayed to the user in JavaFX application in ImageView components. Can you help me to get the exact position of the mouse, the mouse when the user selects a portion of the image from which you want to read the text in the pdf file?
With this code I read pdf file and get text from the set position, only I must to manually input position:( . I use pdfbox-1.3.1.jar. I would like to position the client chooses to keep a picture in the list and read the text from the pdf file with all of these positions.
File file = new File("E:/invoice-template-1.pdf");
PDDocument document = PDDocument.load(file);
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
Rectangle rect1 = new Rectangle(38, 275, 15, 100);
Rectangle rect2 = new Rectangle(54, 275, 40, 100);
stripper.addRegion("row1column1", rect1);
stripper.addRegion("row1column2", rect2);
List allPages = document.getDocumentCatalog().getAllPages();
List<PDPage> pages = document.getDocumentCatalog().getAllPages();
int j = 0;
for (PDPage page : pages) {
stripper.extractRegions(page);
stripper.setSortByPosition(true);
List<String> regions = stripper.getRegions();
for (String region : regions) {
String text = stripper.getTextForRegion(region);
System.out.println("Region: " + region + " on Page " + j);
System.out.println("\tText: \n" + text);
}
For example, in the next invoice, I want to select the 4 positions to export the text, and when you select the picture, the dimensions of keeping in the list, then go through the list and from those positions export text from pdf file.