I'm doing pdftotext -bbox file.pdf and that produces word-level output. Is there a way to output coordinates on the character/phrase/line/block level?
I'm interested in knowing if either the poppler or xpdf version of pdftotext can do this.
I'm doing pdftotext -bbox file.pdf and that produces word-level output. Is there a way to output coordinates on the character/phrase/line/block level?
I'm interested in knowing if either the poppler or xpdf version of pdftotext can do this.
Sure, just use pdftotext -bbox-layout
and it will give you the structure you need.