0

I have a PDF file on which I want to report a specific "Contact Name:" text bbox set of coordinates.

I am using pyquery with a kind of statement: pdf.pq('LTTextLineHorizontal:contains("Contact Name:")') Then, I select the coordinates with float(ContactNameLocation.attr('y0')) However, there is multiple "LTTextLineHorizontal" at the same hierarchy level where the text is located.

Contact Name:

Only the first bbox set of coordinates is reported when the text has its bbox at the end. How shall I do to get the last bbox coordinates?

Thanks for your help

  • Example of ODF converterd to XML: " Contact Name: " pdf.pq('LTTextLineHorizontal:contains("Contact Name:")') returns bbox="[62.88, 372.072, 306.24, 388.304]" instead of box="[68.16, 364.152, 125.761, 372.152]" How can I get the second bbox instead of the first one? – laurentzed Jul 26 '23 at 21:07

0 Answers0