2

Is there a way to get the bounding box of a text line using PDFBox?

rivu
  • 2,004
  • 2
  • 29
  • 45
  • 1
    Yes, there is a way. – mkl Oct 30 '15 at 10:09
  • Your question is a bit vague. Do want to measure a string's bounds when drawn with a specific PDFont? Or do you have a content stream that produces a line of text and want to know its bounds when rendered? Do you want to know the answer based on a particular graphics states (for example, are any transformations applied)? – Andreas Mayer Oct 30 '15 at 15:56
  • I would say, `content stream that produces a line of text and want to know its bounds when rendered`. – rivu Oct 30 '15 at 18:01
  • 1
    In the 2.0 version, have a look at the DrawPrintTextLocations example. – Tilman Hausherr Nov 04 '15 at 12:59

3 Answers3

1

In case you want to compute the bounding box of a content stream that produces text lines (or any other content stream) you have to process the content stream and keep track of the bounds of the areas being painted. You don't have to actually draw the page.

In order to do so, you should extend PDFStreamEngine and override all methods that construct paths (including the clipping path), fill and/or stroke a path, and show glyphs. Note that PDFBox 2.0.0 provides the new sub-class org.apache.pdfbox.contentstream.PDFGraphicsStreamEngine that should make the task easier for you, but you should also be able to implement this with 1.8.x -- it just takes a litte more effort. See org.apache.pdfbox.rendering.PageDrawer and the org.apache.pdfbox.examples.rendering.CustomGraphicsStreamEngine for example implementations of PDFStreamEngine.

Also note that there is text rendering mode 3, in which text operators neither stroke nor fill glyphs (invisible text). It's up to you, whether you treat text being shown in this mode as paint or not. The same goes for fill or stroke operations with transparent color.

Andreas Mayer
  • 687
  • 5
  • 15
1

To all the others out there searching for a simple solution: Extend org.apache.pdfbox.text.PDFTextStripper and override its member function writeString(String, List<TextPosition>).

Sebastian Schmitt
  • 433
  • 1
  • 5
  • 18
1

You can create CustomPDFTextStripper which extends PDFTextStripper and override protected void writeString(String text, List<TextPosition> textPositions). In this overriden method you need to compute coordinates of the bounding box from List<TextPosition>. You can check my answer https://stackoverflow.com/a/62966618/2598453 where you can find also working solution for getting bounding boxes for each word.

Milan Hlinák
  • 4,260
  • 1
  • 30
  • 41