-1

When you want to create a visible signature using PDFBox you need to create a Rectangle2D object.

Rectangle2D humanRect = new Rectangle2D.Float(100, 200, 150, 50);

I would like to know if it is possible to find all the white spaces(rectangles) in the document(or from the first/last page) of a certain size (width x height). I would like to choose one of these positions for my signature form.

I would like to use it as in the following example:

Rectangle2D humanRect = new Rectangle2D.Float(foundX, foundY, width, height);

3de
  • 13
  • 1
  • 6
  • You mean you are looking for something like a port of [the `FreeSpaceFinderExt` class from this iText answer](https://stackoverflow.com/a/26503289/1729265) to PDFBox? – mkl May 02 '22 at 15:41
  • Why not render the page, then look for white pixels and find white rectangles from that? – Tilman Hausherr May 03 '22 at 09:18

1 Answers1

0

As already confirmed in a comment to the question, you essentially are looking for a port of the functionality of the FreeSpaceFinder and FreeSpaceFinderExt classes for iText from this answer to PDFBox. This is the focus of this answer:

If you want to determine something from the content stream instructions of a page with PDFBox, you usually will create a class based on PDFStreamEngine or one of its subclasses. For anything that's not focusing on text extraction most often the PDFGraphicsStreamEngine is the base class of choice.

Based on that we can essentially copy the functionality of the mentioned iText based classes:

public class FreeSpaceFinder extends PDFGraphicsStreamEngine {
    //
    // constructors
    //
    public FreeSpaceFinder(PDPage page, float minWidth, float minHeight) {
        this(page, page.getCropBox().toGeneralPath().getBounds2D(), minWidth, minHeight);
    }

    public FreeSpaceFinder(PDPage page, Rectangle2D initialBox, float minWidth, float minHeight) {
        this(page, Collections.singleton(initialBox), minWidth, minHeight);
    }

    public FreeSpaceFinder(PDPage page, Collection<Rectangle2D> initialBoxes, float minWidth, float minHeight) {
        super(page);

        this.minWidth = minWidth;
        this.minHeight = minHeight;
        this.freeSpaces = initialBoxes;
    }

    //
    // Result
    //
    public Collection<Rectangle2D> getFreeSpaces() {
        return freeSpaces;
    }

    //
    // Text
    //
    @Override
    protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, Vector displacement)
            throws IOException {
        super.showGlyph(textRenderingMatrix, font, code, displacement);
        Shape shape = calculateGlyphBounds(textRenderingMatrix, font, code);
        if (shape != null) {
            Rectangle2D rect = shape.getBounds2D();
            remove(rect);
        }
    }

    /**
     * Copy of <code>org.apache.pdfbox.examples.util.DrawPrintTextLocations.calculateGlyphBounds(Matrix, PDFont, int)</code>.
     */
    private Shape calculateGlyphBounds(Matrix textRenderingMatrix, PDFont font, int code) throws IOException
    {
        GeneralPath path = null;
        AffineTransform at = textRenderingMatrix.createAffineTransform();
        at.concatenate(font.getFontMatrix().createAffineTransform());
        if (font instanceof PDType3Font)
        {
            // It is difficult to calculate the real individual glyph bounds for type 3 fonts
            // because these are not vector fonts, the content stream could contain almost anything
            // that is found in page content streams.
            PDType3Font t3Font = (PDType3Font) font;
            PDType3CharProc charProc = t3Font.getCharProc(code);
            if (charProc != null)
            {
                BoundingBox fontBBox = t3Font.getBoundingBox();
                PDRectangle glyphBBox = charProc.getGlyphBBox();
                if (glyphBBox != null)
                {
                    // PDFBOX-3850: glyph bbox could be larger than the font bbox
                    glyphBBox.setLowerLeftX(Math.max(fontBBox.getLowerLeftX(), glyphBBox.getLowerLeftX()));
                    glyphBBox.setLowerLeftY(Math.max(fontBBox.getLowerLeftY(), glyphBBox.getLowerLeftY()));
                    glyphBBox.setUpperRightX(Math.min(fontBBox.getUpperRightX(), glyphBBox.getUpperRightX()));
                    glyphBBox.setUpperRightY(Math.min(fontBBox.getUpperRightY(), glyphBBox.getUpperRightY()));
                    path = glyphBBox.toGeneralPath();
                }
            }
        }
        else if (font instanceof PDVectorFont)
        {
            PDVectorFont vectorFont = (PDVectorFont) font;
            path = vectorFont.getPath(code);

            if (font instanceof PDTrueTypeFont)
            {
                PDTrueTypeFont ttFont = (PDTrueTypeFont) font;
                int unitsPerEm = ttFont.getTrueTypeFont().getHeader().getUnitsPerEm();
                at.scale(1000d / unitsPerEm, 1000d / unitsPerEm);
            }
            if (font instanceof PDType0Font)
            {
                PDType0Font t0font = (PDType0Font) font;
                if (t0font.getDescendantFont() instanceof PDCIDFontType2)
                {
                    int unitsPerEm = ((PDCIDFontType2) t0font.getDescendantFont()).getTrueTypeFont().getHeader().getUnitsPerEm();
                    at.scale(1000d / unitsPerEm, 1000d / unitsPerEm);
                }
            }
        }
        else if (font instanceof PDSimpleFont)
        {
            PDSimpleFont simpleFont = (PDSimpleFont) font;

            // these two lines do not always work, e.g. for the TT fonts in file 032431.pdf
            // which is why PDVectorFont is tried first.
            String name = simpleFont.getEncoding().getName(code);
            path = simpleFont.getPath(name);
        }
        else
        {
            // shouldn't happen, please open issue in JIRA
            System.out.println("Unknown font class: " + font.getClass());
        }
        if (path == null)
        {
            return null;
        }
        return at.createTransformedShape(path.getBounds2D());
    }

    //
    // Bitmaps
    //
    @Override
    public void drawImage(PDImage pdImage) throws IOException {
        Matrix ctm = getGraphicsState().getCurrentTransformationMatrix();
        Rectangle2D unitSquare = new Rectangle2D.Float(0, 0, 1, 1);
        Path2D path = new Path2D.Float(unitSquare);
        path.transform(ctm.createAffineTransform());
        remove(path.getBounds2D());
    }

    //
    // Paths
    //
    @Override
    public void appendRectangle(Point2D p0, Point2D p1, Point2D p2, Point2D p3) throws IOException {
        currentPath.moveTo(p0.getX(), p0.getY());
        currentPath.lineTo(p1.getX(), p1.getY());
        currentPath.lineTo(p2.getX(), p2.getY());
        currentPath.lineTo(p3.getX(), p3.getY());
        currentPath.closePath();
    }

    @Override
    public void clip(int windingRule) throws IOException {
        // ignore
    }

    @Override
    public void moveTo(float x, float y) throws IOException {
        currentPath.moveTo(x, y);
    }

    @Override
    public void lineTo(float x, float y) throws IOException {
        currentPath.lineTo(x, y);
    }

    @Override
    public void curveTo(float x1, float y1, float x2, float y2, float x3, float y3) throws IOException {
        currentPath.curveTo(x1, y1, x2, y2, x3, y3);
    }

    @Override
    public Point2D getCurrentPoint() throws IOException {
        // To prevent many warnings...
        return new Point2D.Float();
    }

    @Override
    public void closePath() throws IOException {
        currentPath.closePath();
    }

    @Override
    public void endPath() throws IOException {
        currentPath = new Path2D.Float();
    }

    @Override
    public void strokePath() throws IOException {
        // Better only remove the bounding boxes of the constituting strokes
        remove(currentPath.getBounds2D());
        currentPath = new Path2D.Float();
    }

    @Override
    public void fillPath(int windingRule) throws IOException {
        // Better only remove the bounding boxes of the constituting subpaths
        remove(currentPath.getBounds2D());
        currentPath = new Path2D.Float();
    }

    @Override
    public void fillAndStrokePath(int windingRule) throws IOException {
        // Better only remove the bounding boxes of the constituting subpaths
        remove(currentPath.getBounds2D());
        currentPath = new Path2D.Float();
    }

    @Override
    public void shadingFill(COSName shadingName) throws IOException {
        // ignore
    }

    //
    // helpers
    //
    void remove(Rectangle2D usedSpace)
    {
        final double minX = usedSpace.getMinX();
        final double maxX = usedSpace.getMaxX();
        final double minY = usedSpace.getMinY();
        final double maxY = usedSpace.getMaxY();

        final Collection<Rectangle2D> newFreeSpaces = new ArrayList<Rectangle2D>();

        for (Rectangle2D freeSpace: freeSpaces)
        {
            final Collection<Rectangle2D> newFragments = new ArrayList<Rectangle2D>();
            if (freeSpace.intersectsLine(minX, minY, maxX, minY))
                newFragments.add(new Rectangle2D.Double(freeSpace.getMinX(), freeSpace.getMinY(), freeSpace.getWidth(), minY-freeSpace.getMinY()));
            if (freeSpace.intersectsLine(minX, maxY, maxX, maxY))
                newFragments.add(new Rectangle2D.Double(freeSpace.getMinX(), maxY, freeSpace.getWidth(), freeSpace.getMaxY() - maxY));
            if (freeSpace.intersectsLine(minX, minY, minX, maxY))
                newFragments.add(new Rectangle2D.Double(freeSpace.getMinX(), freeSpace.getMinY(), minX - freeSpace.getMinX(), freeSpace.getHeight()));
            if (freeSpace.intersectsLine(maxX, minY, maxX, maxY))
                newFragments.add(new Rectangle2D.Double(maxX, freeSpace.getMinY(), freeSpace.getMaxX() - maxX, freeSpace.getHeight()));
            if (newFragments.isEmpty())
            {
                add(newFreeSpaces, freeSpace);
            }
            else
            {
                for (Rectangle2D fragment: newFragments)
                {
                    if (fragment.getHeight() >= minHeight && fragment.getWidth() >= minWidth)
                    {
                        add(newFreeSpaces, fragment);
                    }
                }
            }
        }

        freeSpaces = newFreeSpaces;
    }

    void add(Collection<Rectangle2D> rectangles, Rectangle2D addition)
    {
        final Collection<Rectangle2D> toRemove = new ArrayList<Rectangle2D>();
        boolean isContained = false;
        for (Rectangle2D rectangle: rectangles)
        {
            if (rectangle.contains(addition))
            {
                isContained = true;
                break;
            }
            if (addition.contains(rectangle))
                toRemove.add(rectangle);
        }
        rectangles.removeAll(toRemove);
        if (!isContained)
            rectangles.add(addition);
    }

    //
    // hidden members
    //
    Path2D currentPath = new Path2D.Float();
    Collection<Rectangle2D> freeSpaces = null;
    final float minWidth;
    final float minHeight;
}

(FreeSpaceFinder)

Using this FreeSpaceFinder you can find empty areas with given minimum dimensions in a method like this:

public Collection<Rectangle2D> find(PDDocument pdDocument, PDPage pdPage, float minWidth, float minHeight) throws IOException {
    FreeSpaceFinder finder = new FreeSpaceFinder(pdPage, minWidth, minHeight);
    finder.processPage(pdPage);
    return finder.getFreeSpaces();
}

(DetermineFreeSpaces method find)

Applied to the same PDF page as was the iText centric solution with minimum width 200 and height 50, we get:

screen shot

Comparing to the analogous screen shot for the iText variant, we see that we get more possible rectangles here.

This is due to the iText solution using the font-level ascender and descender while we here use the individual glyph bounding boxes.

mkl
  • 90,588
  • 15
  • 125
  • 265
  • *For anything that's not focusing on text extraction most often the PDFGraphicsStreamEngine is the base class of choice.* Does this mean that I can't use this solution for PDF documents that contain only text? – 3de May 26 '22 at 20:40
  • @Adelin *"Does this mean that I can't use this solution for PDF documents that contain only text?"* - You can use it. I only wanted to explain my choice of base class. In the code here I'm not interested in text *extraction*, merely the *position* where text (or other content) is, is of interest. – mkl May 27 '22 at 06:15
  • Your solution works perfectly, but I would like to change the starting points of the rectangles in the upper left corner(at the moment they are in the bottom left). – 3de May 28 '22 at 23:32
  • What do you mean by *change the starting point*? The `Rectangle2D` class allows you to retrieve the coordinates of each corner of the rectangle. – mkl May 29 '22 at 06:30
  • The reference point of the rectangle coordinates is the bottom left corner of the page, ie the origin of the xy axis is the bottom left corner of the page. I would like the reference point to be the top left corner of the page. – 3de May 29 '22 at 08:47
  • *"The reference point of the rectangle coordinates is the bottom left corner of the page, ie the origin of the xy axis is the bottom left corner of the page."* - Don't count on that. The origin of the default user space of a page may be anywhere, depending on media box and crop box settings of the page. It merely is often in the lower left corner because that simplifies matters. *"I would like the reference point to be the top left corner of the page."* - Then create the original page that way, i.e. with a crop box with a lower **x** coordinate of 0 and an upper **y** coordinate of 0. – mkl May 29 '22 at 09:21
  • *Then create the original page that way*.I'm trying to find free spaces(rectangles) from an already created pdf.The problem occurs when I try to position the signature widget in the found space. – 3de May 29 '22 at 09:46
  • Then I don't understand your problem. The rectangle coordinates you get from the code should be exactly the coordinates for the **Rect** of a widget annotation. – mkl May 29 '22 at 09:53
  • `Collection rectangles = freeSpaceFinder.getFreeSpaces(); ArrayList rectangle2DArrayList = new ArrayList<>(rectangles); humanRect = rectangle2DArrayList.get(0);//get first rectangle rect = createSignatureRectangle(doc, humanRect); //https://svn.apache.org/viewvc/pdfbox/trunk/examples/src/main/java/org/apache/pdfbox/examples/signature/CreateVisibleSignature2.java?view=markup#:~:text=private%20PDRectangle-,createSignatureRectangle,-(PDDocument%20doc `That's what I'm trying to do ,but the created rectangle has a weird position – 3de May 29 '22 at 12:47
  • Yikes. Ugly. But that's only a PDFBox example after all. I'd propose changing it to not use this `createSignatureRectangle` method with its twisted *humanRect* way of thinking. Instead create a `PDRectangle` directly with the position and dimensions of the chosen `Rectangle2D`. – mkl May 29 '22 at 13:39
  • Can you please explain how FreeSpaceFinder works? – 3de Jun 11 '22 at 13:15
  • It starts with the crop box as only rectangle in `freeSpaces`. Then the page is processed and whenever something is drawn on the page, it's bounding box is considered: for each rectangle in `freeSpaces` that intersects that bounding box, the rectangle is removed from `freeSpaces` and the parts of it above, below, left, and right of the bounding box are added. After the whole page is processed, the rectangles in `freeSpaces` don't intersect any content bounding box and are maximal with that property. – mkl Jun 11 '22 at 17:12
  • But what do you mean by "_are maximal with that property_"? – 3de Jun 11 '22 at 18:22
  • *《But what do you mean by "are maximal with that property"?》* - it means that if you take any of these rectangles and enlarge it (inside the crop box), it will not have that property anymore, i.e. it will intersect some content bounding box. – mkl Jun 11 '22 at 19:08