0

I am trying to check whether the text is BOXED using apache PDFBOX. for few PDF the below code wont work.

public class PDFBoxReader extends PDFGraphicsStreamEngine {

    private static ArrayList<Rectangle2D> recList = new ArrayList<Rectangle2D>();
    
    public PDFBoxReader(PDPage page) {
        super(page);
    }

    public static boolean isTextBoxed(PDDocument document, String text) {
        StingBuffer boxedText = new StringBuffer();

        for (PDPage page : document.getPages()) {
            PDFBoxReader reader = new PDFBoxReader(page);
            try {
                PDFTextStripperByArea stripper = new PDFTextStripperByArea();
                rectList = new ArrayList<Rectangle2D>();
                reader.processPage(page);
                for (Rectangle2D react : rectList) {
                    Double y = page.cropBox().getUpperRightY() - rect.getY() - rect.getHeight();
                    rect.setRect(rect.getX(), y, rect.getWidth(), rect.getWidth(), rect.getHeight());
                    stripper.addRegion("box", rect);
                    stripper.extractRegions(page);
                    boxedText.append(stripper.getTextForRegion("box"));
                }
                if (isTextMatched(text, stripper.getTextForRegion("box"))) {
                    return true;
                }
            } catch (IoException exception) {
                // exception is handled here
            }
        }
    }
    
    // some more methods here
}

PDF dont have any acroform. it has a paragraph in a bordered box at the end of the page.

Arunkumar
  • 3
  • 2

0 Answers0