Get Metadata for each page from a batch PDF

Asked Apr 18 '23 at 08:43

Active Apr 21 '23 at 18:20

Viewed 98 times

I am trying to extract the page name (shown in the screenshot below) for each page from the batch PDF which has been produced from AutoCAD.

I have tried PyMuPDF, PyPDF2 and PDFMiner but I can't seem to find where this info is stored in the PDF document.

Image showing Page Name in a Batch PDF

import fitz
pdfLocation = "TestPDF.pdf"
doc = fitz.open(pdfLocation)

for i in range(doc.pageCount):
  page = doc[i]
  # cont = page.get_text() Need help here for the page name 
doc.close()

edited Apr 21 '23 at 18:20

Martin Thoma

124,992
159
614
958

asked Apr 18 '23 at 08:43

Stark Arpit

1

A page does not normally have metadata. Maybe in your case this is normal text written at some predefined position. Or there may exist XML metadata associated with the page's object definition. To tell what we have here, I would need the PDF itself. In any case, the requested information can be extracted with PyMuPDF. – Jorj McKie Apr 18 '23 at 08:54
PyPDF2 is deprecated, please use `pypdf`. You might want to give [pdfly](https://pypi.org/project/pdfly/) a try. It is an application based on pypdf. If `pdfly meta example.pdf` oder `pdfly pagemeta example.pdf 0` (0 means "index 0 page") shows the data you're looking for, then it's easy to do with `pypdf`. – Martin Thoma Apr 21 '23 at 18:23
It might help if you could share an example PDF + a couple of pages and what you would expect from those pages. – Martin Thoma Apr 21 '23 at 18:25

Get Metadata for each page from a batch PDF

0 Answers0