I'm processing a PDF document in a program.
The only part of the document I have access to is a list of PDF operations (with their arguments), and a list of horizontal displacements for the glyphs and fonts that appear in the document.
Is it possible from this to calculate the coordinates of each string on any given page? By "a string" I mean either an argument of a Tj, ' or " operators or a string element of the argument of a TJ operator. I don't care in what coordinate space these coordinates are defined, or their units, only that space and units are the same for every point, since I'm mostly trying to calculate relative distances, not actually display them properly.
If that's relevant, the PDF document in question doesn't have any images or vertical text, but it can have multiple Text Objects on a single page, and the strings are not drawn in reading order (moreover the order they're drawn in changes from page to page).
I've tried to figure this out myself from the PDF reference document, but I always have problems with linear algebra, so I'm having a really hard time trying to understand how transformation and different spaces actually work. I've been trying to use the Tm[3,0] and Tm[3,1] elements of Text Matrix Tm as coordinates, and that mostly worked (in that when I order strings using those elements they are usually in a correct reading order), but still there are issues (e.g. on some pages the order gets completely wrong, and in some cases symbols that appear really close to each other on the page actually have a larger distance between them then symbols that appear far away from each other, etc.)
For example let's say I have this sequence of operators:
BT
0 7.3001 -7.3001 0 124.64 301.79 Tm
A Tj
/T1_2 1 Tf
E Tj
0.0157 Tc
0 7.3001 -7.3001 0 124.64 518.99 Tm
SOME Tj
ET
BT
WOW Tj
1.359 0.041 Td
T*
SECOND Tj
0 7.6068 -7.3001 0 269.54 245.01 Tm
LAST Tj
ET
How would one calculate the coordinates of the strings in the resulting file?