12

The Situation:
In PDFBox, PDRectangle objects' default origin (0,0) seems to be the lower-left corner of a page.

For example, the following code gives you a square at the lower-left corner of a page, and each side is 100 units long.

PDRectangle rectangle = new PDRectangle(0, 0, 100, 100);

The Question:
Is it possible to change the origin to the UPPER-LEFT corner, so that, for example, the code above will give you the same square at the UPPER-LEFT corner of the page instead?

The reason I ask:
I was using PDFTextStripper to get the coordinates of the text (by using the getX() and getY() methods of the extracted TextPosition objects). The coordinates retrieved from TextPosition objects seem have an origin (0,0) at the UPPER-LEFT CORNER. I want the coordinates of my PDRectangle objects have the same origin as the coordinates of my TextPosition objects.

I have tried to adjust the Y-coordinates of my PDRectangle by "page height minus Y-coordinate". This gives me the desired result, but it's not elegant. I want an elegant solution.

Note: Someone has asked a similar question. The answer is what I tried, which is not the most elegant. how to change the coordiantes of a text in a pdf page from lower left to upper left

Community
  • 1
  • 1
Brian
  • 323
  • 1
  • 4
  • 12

4 Answers4

24

You can change coordinate systems somewhat but most likely things won't get more elegant in the end.

To start with...

First of all let's clear up some misconception:

You assume

In PDFBox, PDRectangle objects' default origin (0,0) seems to be the lower-left corner of a page.

This is not true for all cases, merely often.

The area containing the displayed page area (on paper or on screen) usually is defined by the CropBox entry of the page in question:

CropBox rectangle (Optional; inheritable) A rectangle, expressed in default user space units, that shall define the visible region of default user space. When the page is displayed or printed, its contents shall be clipped (cropped) to this rectangle and then shall be imposed on the output medium in some implementation-defined manner.

... The positive x axis extends horizontally to the right and the positive y axis vertically upward, as in standard mathematical practice (subject to alteration by the Rotate entry in the page dictionary).

... In PostScript, the origin of default user space always corresponds to the lower-left corner of the output medium. While this convention is common in PDF documents as well, it is not required; the page dictionary’s CropBox entry can specify any rectangle of default user space to be made visible on the medium.

Thus, the origin (0,0) can literally be anywhere, it may be at the lower left, at the upper left, in the middle of the page or even far outside the displayed page area.

And by means of the Rotate entry, that area can even be rotated (by 90°, 180°, or 270°).

Putting the origin (as you seem to have observed) in the lower left merely is done by convention.

Furthermore you seem to think that the coordinate system is constant. This also is not the case, there are operations by which you can transform the user space coordinate system drastically, you can translate, rotate, mirror, skew, and/or scale it!

Thus, even if at the beginning the coordinate system is the usual one, origin in lower left, x-axis going right, y-axis going up, it may be changed to something weird some way into the page content description. Drawing your rectangle new PDRectangle(0, 0, 100, 100) there might produce some rhomboid form just right of the page center.

What you can do...

As you see coordinates in PDF user space are a very dynamic matter. what you can do to tame the situation, depends on the context you use your rectangle in.

Unfortunately you were quite vague in the description of what you do. Thus, this will be somewhat vague, too.

Coordinates in the page content

If you want to draw some rectangle on an existing page, you first of all need a page content stream to write to, i.e. a PDPageContentStream instance, and it should be prepared in a manner guaranteeing that the original user space coordinate system has not been disturbed. You get such an instance by using the constructor with three boolean arguments setting all them to true:

PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true, true);

Then you can apply a transformation to the coordinate system. You want the top left to be the origin and the y-value increasing downwards. If the crop box of the page tells you the top left has coordinates (xtl, ytl), therefore, you apply

contentStream.concatenate2CTM(new AffineTransform(1, 0, 0, -1, xtl, ytl));

and from here on you have a coordinate system you wanted, origin top left and y coordinates mirrored.

Be aware of one thing, though: If you are going to draw text, too, not only the text insertion point y coordinate is mirrored but also the text itself unless you counteract that by adding an also mirroring text matrix! If you want to add much text, therefore, this may not be as elegant as you want.

Coordinates for annotations

If you don't want to use the rectangle in the content stream but instead for adding annotations, you are not subject to the transformations mentioned above but you can not make use of it, either.

Thus, in this context you have to take the crop box as it is and transform your rectangle accordingly.

Why PDFBox text extraction coordinates are as they are

Essentially for putting lines of text together in the right order and sorting the lines correctly, you don't want such a weird situation but instead a simple stable coordinate system. Some PDFBox developers chose the top-left-origin, y-increasing-downwards variant for that, and so the TextPosition coordinates have been normalized to that scheme.

In my opinion a better choice would have been to use the default user space coordinates for easier re-use of the coordinates. You might, therefore, want to try working with textPosition.getTextMatrix().getTranslateX(), textPosition.getTextMatrix().getTranslateY() for a TextPosition textPosition

mkl
  • 90,588
  • 15
  • 125
  • 265
  • Thanks for the detailed response. Yes, I am indeed trying to add a link annotation, so the concatenate2CTM method is not applicable. When you said "you have to take the crop box as it is and transform your rectangle accordingly", what does that mean? Does that mean I need to transform the cropbox rectangle, or I need to transform the rectangle I am trying to draw? – Brian Jan 24 '15 at 01:56
  • 1
    *what does that mean?* - the rectangle you draw. If you changed the cropbox, you'd move all the page content. Or you retrieve your coordinates differently to begin with, c.f. the last paragraph of the answer. – mkl Jan 24 '15 at 08:39
  • 1
    I think there is a typo in your answer. You wrote `You want the top left to be the origin and the x-value increasing downwards`. I think you meant `y-value` increasing downward, not `x-value`. – Gili Jul 19 '19 at 01:02
  • where is `xtl` and `ytl` coming from? The cropBox of the page is `PDRectangle` and it doesn't give top left x (xtl) and top left y (ytl) – ᴛʜᴇᴘᴀᴛᴇʟ Apr 29 '20 at 02:19
  • The *top left* **x** is the same as the *bottom left* **x**. The *top left* **y** is the same as the *top right* **y**. Thus, `xtl` is `getLowerLeftX()` and `ytl` is `getUpperRightY()` of the crop box `PDRectangle`. – mkl Apr 29 '20 at 04:45
  • @mkl that's good to know. Thank you! As you saw, I found a different solution for now but this is definitely helpful. I'm also using the `pdfbox-layout` library (https://github.com/ralfstuckert/pdfbox-layout) for "fancy" font APIs. The library supports text line wrapping, markup text, and many other very useful features. Modifying that library to do `getTranslateX()` for text just didn't seem right - which is why I need another solution. – ᴛʜᴇᴘᴀᴛᴇʟ Apr 29 '20 at 18:25
7

The following seems to be the best way to "adjust" the TextPosition coordinates:

x_adjusted =  x_original + page.findCropBox().getLowerLeftX();
y_adjusted = -y_original + page.findCropBox().getUpperRightY();

where page is the PDPage on which the TextPosition object is located

Brian
  • 323
  • 1
  • 4
  • 12
2

The accepted answer created some problems for me. Also, text being mirrored and adjusting for that just didn't seem like the right solution for me. So here's what I came up with and so far, this has worked pretty smoothly.

Solution (example available below):

  • Call the getAdjustedPoints(...) method with your original points as you are drawing on paper where x=0 and y=0 is top left corner.
  • This method will return float array (length 4) that can be used to draw rect
  • Array order is x, y, width and height. Just pass that addRect(...) method

private float[] getAdjustedPoints(PDPage page, float x, float y, float width, float height) {
    float resizedWidth = getSizeFromInches(width);
    float resizedHeight = getSizeFromInches(height);
    return new float[] {
            getAdjustedX(page, getSizeFromInches(x)),
            getAdjustedY(page, getSizeFromInches(y)) - resizedHeight,
            resizedWidth, resizedHeight
    };
}

private float getSizeFromInches(float inches) {
    // 72 is POINTS_PER_INCH - it's defined in the PDRectangle class
    return inches * 72f;
}

private float getAdjustedX(PDPage page, float x) {
    return x + page.getCropBox().getLowerLeftX();
}

private float getAdjustedY(PDPage page, float y) {
    return -y + page.getCropBox().getUpperRightY();
}

Example:

private PDPage drawPage1(PDDocument document) {
    PDPage page = new PDPage(PDRectangle.LETTER);

    try {
        // Gray Color Box
        PDPageContentStream contentStream = new PDPageContentStream(document, page, PDPageContentStream.AppendMode.APPEND, false, false);
        contentStream.setNonStrokingColor(Color.decode(MyColors.Gallery));
        float [] p1 = getAdjustedPoints(page, 0f, 0f, 8.5f, 1f);
        contentStream.addRect(p1[0], p1[1], p1[2], p1[3]);
        contentStream.fill();

        // Disco Color Box
        contentStream.setNonStrokingColor(Color.decode(MyColors.Disco));
        p1 = getAdjustedPoints(page, 4.5f, 1f, 4, 0.25f);
        contentStream.addRect(p1[0], p1[1], p1[2], p1[3]);
        contentStream.fill();

        contentStream.close();
    } catch (Exception e) { }

    return page;
}

As you can see, I've drawn 2 rectangle boxes.
To draw this, I used the the following coordinates which assumes that x=0 and y=0 is top left.

Gray Color Box: x=0, y=0, w=8.5, h=1
Disco Color Box: x=4.5 y=1, w=4, h=0.25

Here's an image of my result. enter image description here

ᴛʜᴇᴘᴀᴛᴇʟ
  • 4,466
  • 5
  • 39
  • 73
  • *"text being mirrored and adjusting for that just didn't seem like the right solution for me"* - that's why I'm also not a fan of mirroring the coordinate system. But as the OP considers transforming coordinates like you do to be *not elegant* and wants a changed coordinate system instead, I did show him how to mirror the coordinate system nonetheless. – mkl Apr 29 '20 at 09:37
0

Add the height of the PDF (Easiest Solution)

  • Nope. Consider the OP's example rectangle, by adding the height of the page you move it out of the page while he wants it in the upper left corner inside the page. – mkl Jul 13 '18 at 10:51
  • If you add Page height it will not go out of the PDF, it just become inverted :) – Krishna Prasad D Jul 16 '18 at 05:21
  • *"it just become inverted"* - that's incorrect. I assume by *adding the height of the page* to the OP's `new PDRectangle(0, 0, 100, 100)` you mean *adding the height to the base point*, i.e. `new PDRectangle(0, pageHeight, 100, 100)`. This rectangle clearly is out of the page and not inverted. If you mean something different, your answer clearly lacks detail. – mkl Jul 16 '18 at 09:23