1

I am having some confusions about the current transformation matrix (CTM) in PDFs. For page 5 in this PDF, I have examined the Token Stream (http://pastebin.com/k6g4BGih) and that shows the last cm operation before the curve (c) commands sets the transfomration matrix to COSInt{10},COSInt{0},COSInt{0},COSInt{10},COSInt{0},COSInt{0}. The full output is at http://pastebin.com/9XaPQQm9 .

Next I used the following set of codes to extract the line and curve commands from the same page following a code @mkl provided in a related SO question

  1. Main class: http://pastebin.com/htiULanR
  2. Helper classes:

    a. Class that extends PDFGraphicsStreamEngine: http://pastebin.com/zL2p75ha

    b. Path: http://pastebin.com/d3vXCgnC

    c. Subpath: http://pastebin.com/CxunHPiZ

    d. Segment: http://pastebin.com/XP1Dby6U

    e. Rectangle: http://pastebin.com/fNtHNtws

    f. Line: http://pastebin.com/042cgZBp

    g. Curve: http://pastebin.com/wXbXZdqE

In that code, I printed the CTM using getGraphicsState().getCurrentTransformationMatrix() inside the curveTo() method that is overridden from PDFGraphicsStreamEngine class. That shows the CTM as [0.1,0.0,0.0,0.1,0.0,0.0]. So my questions are:

  1. Shouldn't these two CTMs be the same?

  2. Both these CTMs have scaling operations: the first one scales with a factor of 10 and the second one scales with a factor of 0.1. If I ignore the scaling, I can create an SVG which looks fairly close to the original PDF. But I am confused why that should happen. Do I need to consider all transformation matrices before the path instead of the last one?

Community
  • 1
  • 1
rivu
  • 2,004
  • 2
  • 29
  • 45

1 Answers1

4

First of all: You say

the last cm operation before the curve (c) commands sets the transfomration matrix to COSInt{10},COSInt{0},COSInt{0},COSInt{10},COSInt{0},COSInt{0}.

This is not correct, cm does not set the transformation matrix to the parameter values but it multiplies the matrix parameter and the former current transformation matrix and sets the result as the new current transformation matrix, a process also called concatenation. Thus:

  1. Shouldn't these two CTMs be the same?

No, because cm doesn't set, it concatenates!

Furthermore, the current transformation matrix (and all other graphics state values!) is not only changed by the explicit setter or concatenator instructions but also the restore-state instruction which you ignore currently. Thus:

  1. Do I need to consider all transformation matrices before the path instead of the last one?

You may have to consider more than the last, but only those not undone by graphics state restoration.


Let's look at your example document...

When you want to keep track of the current transformation matrix, you have to inspect both the cm and the q/Q instructions. In case of your page 5 the content stream with focus on those instructions up to the first c curve instruction looks like this:

q 0.1 0 0 0.1 0 0 cm
q
q 10 0 0 10 0 0 cm BT
[...large text object...]
ET Q
Q
q 
[...clip path definition...]
q 10 0 0 10 0 0 cm BT 
[...small text object...]
ET Q
Q
q 
[...new clip path definition...]
0.737761 w
1 i
2086.54 2327.82 m
2088.17 2327.59 2089.82 2327.47 2091.46 2327.47 c 

Assuming a starting identity transformation matrix this implies the following flow of currently current transformation matrix and the current transformation matrices in the graphics stack:

CTM: 1 0 0 1 0 0

Stack: empty

q

CTM: 1 0 0 1 0 0

Stack: 1 0 0 1 0 0

0.1 0 0 0.1 0 0 cm

CTM: 0.1 0 0 0.1 0 0

Stack: 1 0 0 1 0 0

q

CTM: 0.1 0 0 0.1 0 0

Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0

q

CTM: 0.1 0 0 0.1 0 0

Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0 / 0.1 0 0 0.1 0 0

10 0 0 10 0 0 cm

CTM: 1 0 0 1 0 0

Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0 / 0.1 0 0 0.1 0 0

BT
[...large text object...]
ET Q

CTM: 0.1 0 0 0.1 0 0

Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0

Q

CTM: 0.1 0 0 0.1 0 0

Stack: 1 0 0 1 0 0

q 

CTM: 0.1 0 0 0.1 0 0

Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0

[...clip path definition...]
q

CTM: 0.1 0 0 0.1 0 0

Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0 / 0.1 0 0 0.1 0 0

10 0 0 10 0 0 cm

CTM: 1 0 0 1 0 0

Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0 / 0.1 0 0 0.1 0 0

BT 
[...small text object...]
ET Q

CTM: 0.1 0 0 0.1 0 0

Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0

Q

CTM: 0.1 0 0 0.1 0 0

Stack: 1 0 0 1 0 0

q 

CTM: 0.1 0 0 0.1 0 0

Stack: 1 0 0 1 0 0 / 0.1 0 0 0.1 0 0

[...new clip path definition...]
0.737761 w
1 i
2086.54 2327.82 m
2088.17 2327.59 2089.82 2327.47 2091.46 2327.47 c 

Thus, PDFBox is correct when you observe:

I printed the CTM using getGraphicsState().getCurrentTransformationMatrix() inside the curveTo() method that is overridden from PDFGraphicsStreamEngine class. That shows the CTM as [0.1,0.0,0.0,0.1,0.0,0.0]

mkl
  • 90,588
  • 15
  • 125
  • 265
  • Thanks so much for the wonderful answer. But if I take the transformation matrix to be [0.1,0,0,0.1,0,0] at the point the curve is being painted, that would scale the path by 1/10th, right? Somehow that doesn't seem visually right, see the SVG I attached in the post, which i create *without* doing the transformation. – rivu Jun 24 '16 at 15:04
  • But in your svg you have coordinate values in the range 200..700 while the coordinate values in the PDF content stream are 2000 upwards. Thus, you seem to have created it *with* a 1/10th scaling transformation. – mkl Jun 24 '16 at 15:48
  • Oh, so when I override the method say *lineto(x,y)* from PDFGraphicsStreamEngine, x and y are already transformed? I don't have to take care of the transformations myself? – rivu Jun 24 '16 at 15:52
  • @rivu Yes, the coordinates you retrieve in your lineTo implementation are the result of applying the current transformation matrix to the coordinate values from the content stream. – mkl Jun 24 '16 at 15:58
  • Oh thank you so much. I guess I understand now. Thanks for being so patient with me :). – rivu Jun 24 '16 at 16:00