I am trying to check whether the text is BOXED using apache PDFBOX. for few PDF the below code wont work.
public class PDFBoxReader extends PDFGraphicsStreamEngine {
private static ArrayList<Rectangle2D> recList = new ArrayList<Rectangle2D>();
public PDFBoxReader(PDPage page) {
super(page);
}
public static boolean isTextBoxed(PDDocument document, String text) {
StingBuffer boxedText = new StringBuffer();
for (PDPage page : document.getPages()) {
PDFBoxReader reader = new PDFBoxReader(page);
try {
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
rectList = new ArrayList<Rectangle2D>();
reader.processPage(page);
for (Rectangle2D react : rectList) {
Double y = page.cropBox().getUpperRightY() - rect.getY() - rect.getHeight();
rect.setRect(rect.getX(), y, rect.getWidth(), rect.getWidth(), rect.getHeight());
stripper.addRegion("box", rect);
stripper.extractRegions(page);
boxedText.append(stripper.getTextForRegion("box"));
}
if (isTextMatched(text, stripper.getTextForRegion("box"))) {
return true;
}
} catch (IoException exception) {
// exception is handled here
}
}
}
// some more methods here
}
PDF dont have any acroform. it has a paragraph in a bordered box at the end of the page.