Using PyPDF2
to read a pdf
file with some line drawings, using code like below
from PyPDF2 import PdfFileReader
with open('temp.pdf','rb') as f:
pdf = PdfFileReader(f)
for page in pdf.pages:
print page['/Contents'].getData()
I see page content that looks like this:
q 0.24 0 0 0.24 0 0 cm
/R7 gs
8.5 w
1 J
1 j
0 0 0 RG
2361 118.961 m
2361 3388.96 l
S
2361 3388.96 m
118 3388.96 l
S
...
To me this looks like PostScript, using aliases for the operators (please correct me if I'm wrong).
Some of these aliases I believe I can decipher, e.g. m
, l
, and S
look to me like newpath moveto
, lineto
, and stroke
, respectively. However, it would be a great help if I could have a look at the alias definitions (bind def
) which I assume must be present somewhere at the start of the file.
I guess this should not be difficult, if you know how, but I have not been able to find out how to access this postscript header information using PyPDF2
(despite reading the docs and searching the web, including StackOverflow).
Could someone tell me? Or am I on the wrong track entirely?