Is there a way to get the bounding box of a text line using PDFBox?
-
1Yes, there is a way. – mkl Oct 30 '15 at 10:09
-
Your question is a bit vague. Do want to measure a string's bounds when drawn with a specific PDFont? Or do you have a content stream that produces a line of text and want to know its bounds when rendered? Do you want to know the answer based on a particular graphics states (for example, are any transformations applied)? – Andreas Mayer Oct 30 '15 at 15:56
-
I would say, `content stream that produces a line of text and want to know its bounds when rendered`. – rivu Oct 30 '15 at 18:01
-
1In the 2.0 version, have a look at the DrawPrintTextLocations example. – Tilman Hausherr Nov 04 '15 at 12:59
3 Answers
In case you want to compute the bounding box of a content stream that produces text lines (or any other content stream) you have to process the content stream and keep track of the bounds of the areas being painted. You don't have to actually draw the page.
In order to do so, you should extend PDFStreamEngine
and override all methods that construct paths (including the clipping path), fill and/or stroke a path, and show glyphs. Note that PDFBox 2.0.0 provides the new sub-class org.apache.pdfbox.contentstream.PDFGraphicsStreamEngine
that should make the task easier for you, but you should also be able to implement this with 1.8.x -- it just takes a litte more effort. See org.apache.pdfbox.rendering.PageDrawer
and the org.apache.pdfbox.examples.rendering.CustomGraphicsStreamEngine
for example implementations of PDFStreamEngine
.
Also note that there is text rendering mode 3, in which text operators neither stroke nor fill glyphs (invisible text). It's up to you, whether you treat text being shown in this mode as paint or not. The same goes for fill or stroke operations with transparent color.

- 687
- 5
- 15
To all the others out there searching for a simple solution:
Extend org.apache.pdfbox.text.PDFTextStripper
and override its member function writeString(String, List<TextPosition>)
.

- 433
- 1
- 5
- 18
You can create CustomPDFTextStripper
which extends PDFTextStripper
and override protected void writeString(String text, List<TextPosition> textPositions)
. In this overriden method you need to compute coordinates of the bounding box from List<TextPosition>
. You can check my answer https://stackoverflow.com/a/62966618/2598453 where you can find also working solution for getting bounding boxes for each word.

- 4,260
- 1
- 30
- 41