2

I need to read a plan exported by AutoCAD to PDF and place some markers with text on it with PDFBox. Everything works fine, except the calculation of the width of the text, which is written next to the markers.

I skimmed through the whole PDF specification and read in detail the parts, which deal with the graphic and the text, but to no avail. As far as I understand, the glyph coordinate space is set up in a 1/1000 of the user coordinate space. Hence the width need to be scale up by 1000, but it's still a fraction of the real width.

This is what I am doing to position the text:

float textWidth = font.getStringWidth(marker.id) * 0.043f;
contentStream.beginText();
contentStream.setTextScaling(1, 1, 0, 0);
contentStream.moveTextPositionByAmount(
  marker.endX + marker.getXTextOffset(textWidth, fontPadding),
  marker.endY + marker.getYTextOffset(fontSize, fontPadding));
contentStream.drawString(marker.id);
contentStream.endText();

The * 0.043f works as an approximation for one document, but fails for the next. Do I need to reset any other transformation matrix except the text matrix?

EDIT: A full idea example project is on github with tests and example pdfs: https://github.com/ascheucher/pdf-stamp-prototype

Thanks for your help!

Jeffrey Knight
  • 5,888
  • 7
  • 39
  • 49
andreas
  • 1,483
  • 1
  • 15
  • 36
  • Can you share sample documents (e.g. one where your code works and one where it doesn't) and more code, especially concerning the marker methods and how you startstart Editing the content stream? – mkl Dec 23 '14 at 10:14
  • @mkl: I have pushed the code to github. tests and testdata is included. – andreas Dec 23 '14 at 11:14
  • 1
    I'll look at it later. Currently shopping for Xmas. ;) – mkl Dec 23 '14 at 16:33
  • No hurry, in the middle of Christmas preparations as well. Will not have time for it till January anyway... But thanks in advance! Nice Christmas for you and your familly! – andreas Dec 23 '14 at 17:19
  • Thank you. I hope you had a great holiday time. I am currently looking into the sample. I use maven, not idea, so some minute patches were necessary. Could you indicate which test shows the failure and which the success? As you set most tests to `@Ignore` ) assume the remaining two tests demonstrate the issue, don't they? – mkl Jan 05 '15 at 11:24
  • hi, @mkl. holiday was fine :) thanks. The tests set to ignore are either testing other behavior or use test pdf files I could not include. The active tests are drawing the marker on two distinct plans. They have different size, hence the different scale. The annotating text should be center aligned on the top and bottom marker, aligned to the left on the right marker and aligned to the right on the left marker. The alignment is not working for me, as the font.getSTringWidth( .. ) returns only a fraction of what it seems to be. And the discrepance seems to be different in both PDFs. – andreas Jan 05 '15 at 20:41
  • Which viewer do you use? I try to view the test outputs using Adobe Reader XI but it tells me **An error exists on this page. Acrobat may not display the page correctly.** Thereafter it shows only the original plans, at least I see no markers at all. – mkl Jan 06 '15 at 08:37
  • Ah, the reason for that error is that you use a **CalRGB** color space without **WhitePoint** (which is a required value). As you thereafter use `DeviceRGB` colors, though, that should not matter. – mkl Jan 06 '15 at 09:05
  • Ok. I used the native Ubuntu PDF viewer. It did not complain, but good to know. – andreas Jan 06 '15 at 13:14

1 Answers1

10

Unfortunately the question and comments merely include (by running the sample project) the actual result for two source documents and the description

The annotating text should be center aligned on the top and bottom marker, aligned to the left on the right marker and aligned to the right on the left marker. The alignment is not working for me, as the font.getSTringWidth( .. ) returns only a fraction of what it seems to be. And the discrepance seems to be different in both PDFs.

but not a concrete sample discrepancy to repair.

There are several issues in the code, though, which may lead to such observations (and other ones, too!). Fixing them should be done first; this may already resolve the issues observed by the OP.

Which box to take

The code of the OP derives several values from the media box:

PDRectangle pageSize = page.findMediaBox();
float pageWidth = pageSize.getWidth();
float pageHeight = pageSize.getHeight();
float lineWidth = Math.max(pageWidth, pageHeight) / 1000;
float markerRadius = lineWidth * 10;
float fontSize = Math.min(pageWidth, pageHeight) / 20;
float fontPadding = Math.max(pageWidth, pageHeight) / 100;

These seem to be chosen to be optically pleasing in relation to the page size. But the media box is not, in general, the final displayed or printed page size, the crop box is. Thus, it should be

PDRectangle pageSize = page.findCropBox();

(Actually the trim box, the intended dimensions of the finished page after trimming, might even be more apropos; the trim box defaults to the crop box. For details read here.)

This is not relevant for the given sample documents as they do not contain explicit crop box definitions, so the crop box defaults to the media box. It might be relevant for other documents, though, e.g. those the OP could not include.

Which PDPageContentStream constructor to use

The code of the OP adds a content stream to the page at hand using this constructor:

PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true);

This constructor appends (first true) and compresses (second true) but unfortunately it continues in the graphics state left behind by the pre-existing content.

Details of the graphics state of importance for the observations at hand:

  • Transformation matrix - it may have been changed to scale (or rotate, skew, move ...) any new content added
  • Character spacing - it may have been changed to put any new characters added nearer to or farther from each other
  • Word spacing - it may have been changed to put any new words added nearer to or farther from each other
  • Horizontal scaling - it may have been changed to scale any new characters added
  • Text rise - it may have been changed to displace any new characters added vertically

Thus, a constructor should be chosen which also resets the graphics state:

PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true, true);

The third true tells PDFBox to reset the graphics state, i.e. to surround the former content with a save-state/restore-state operator pair.

This is relevant for the given sample documents, at least the transformation matrix is changed.

Setting and using the CalRGB color space

The OP's code sets the stroking and non-stroking color spaces to a calibrated color space:

contentStream.setStrokingColorSpace(new PDCalRGB());
contentStream.setNonStrokingColorSpace(new PDCalRGB());

Unfortunately new PDCalRGB() does not create a valid CalRGB color space object, its required WhitePoint value is missing. Thus, before selecting a calibrated color space, initialize it properly.

Thereafter the OP's code sets the colors using

contentStream.setStrokingColor(marker.color.r, marker.color.g, marker.color.b);
contentStream.setNonStrokingColor(marker.color.r, marker.color.g, marker.color.b);

These (int, int, int) overloads unfortunately use the RG and rg operators implicitly selecting the DeviceRGB color space. To not overwrite the current color space, use the (float[]) overloads with normalized (0..1) values instead.

While this is not relevant for the observed issue, it causes error messages by PDF viewers.

Calculating the width of a drawn string

The OP's code calculates the width of a drawn string using

float textWidth = font.getStringWidth(marker.id) * 0.043f;

and the OP is surprised

The * 0.043f works as an approximation for one document, but fails for the next.

There are two factors building this "magic" number:

  • As the OP has remarked the glyph coordinate space is set up in a 1/1000 of the user coordinate space and that number is in glyph space, thus a factor of 0.001.

  • As the OP has ignored he wants the width for the string using the font size he selected. But the font object has no knowledge of the current font size and returns the width for a font size of 1. As the OP selects the font size dynamically as Math.min(pageWidth, pageHeight) / 20, this factor varies. In case of the two given sample documents about 42 but probably totally different in other documents.

Positioning text

The OP's code positions the text like this starting from identity text matrices:

contentStream.moveTextPositionByAmount(
    marker.endX + marker.getXTextOffset(textWidth, fontPadding),
    marker.endY + marker.getYTextOffset(fontSize, fontPadding));

using methods getXTextOffset and getYTextOffset:

public float getXTextOffset(float textWidth, float fontPadding) {
    if (getLocation() == Location.TOP)
        return (textWidth / 2 + fontPadding) * -1;
    else if (getLocation() == Location.BOTTOM)
        return (textWidth / 2 + fontPadding) * -1;
    else if (getLocation() == Location.RIGHT)
        return 0 + fontPadding;
    else
        return (textWidth + fontPadding) * -1;
}

public float getYTextOffset(float fontSize, float fontPadding) {
    if (getLocation() == Location.TOP)
        return 0 + fontPadding;
    else if (getLocation() == Location.BOTTOM)
        return (fontSize + fontPadding) * -1f;
    else
        return fontSize / 2 * -1;
}

In case of getXTextOffset I doubt that adding fontPadding for Location.TOP and Location.BOTTOM makes sense, especially in the light of the OP's desire

The annotating text should be center aligned on the top and bottom marker

For the text to be centered it should not be shifted off-center.

The case of getYTextOffset is more difficult. The OP's code is built upon two misunderstandings: It assumes

  • that the text position selected by moveTextPositionByAmount is the lower left, and
  • that the font size is the character height.

Actually the text position is positioned on the base line, the glyph origin of the next drawn glyph will be positioned there, e.g.

Glyph origin, width, and bounding box for 'g'

Thus, the y positioned either has to be corrected to take the descent into account (for centering on the whole glyph height) or only use the ascent (for centering on the above-baseline glyph height).

And a font size does not denote the actual character height but is arranged so that the nominal height of tightly spaced lines of text is 1 unit for font size 1. "Tightly spaced" implies that some small amount of additional inter-line space is contained in the font size.

In essence for centering vertically one has to decide what to center on, whole height or above-baseline height, first letter only, whole label, or all font glyphs. PDFBox does not readily supply the necessary information for all cases but methods like PDFont.getFontBoundingBox() should help.

Community
  • 1
  • 1
mkl
  • 90,588
  • 15
  • 125
  • 265
  • thanks, there is just one point, which is not clear to me. What is the font bounding box. It's clear what a glyphs bounding box describes, but the font, does not know, which characters it describes nor which size the font is. The second can be scaled with the calculated font size of course, but i don't understand the font bounding box. – andreas Jan 07 '15 at 16:23
  • 1
    According to the specification: **A rectangle (see 7.9.5, "Rectangles"), expressed in the glyph coordinate system, that shall specify the font bounding box. This should be the smallest rectangle enclosing the shape that would result if all of the glyphs of the font were placed with their origins coincident and then filled.** – mkl Jan 07 '15 at 16:36
  • 1
    Wish Stackoverflow had some way to reward great answers like this one – Edi Mar 09 '16 at 20:45