0

I am trying to extract the page name (shown in the screenshot below) for each page from the batch PDF which has been produced from AutoCAD.

I have tried PyMuPDF, PyPDF2 and PDFMiner but I can't seem to find where this info is stored in the PDF document.

Image showing Page Name in a Batch PDF

import fitz
pdfLocation = "TestPDF.pdf"
doc = fitz.open(pdfLocation)

for i in range(doc.pageCount):
  page = doc[i]
  # cont = page.get_text() Need help here for the page name 
doc.close()
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • 1
    A page does not normally have metadata. Maybe in your case this is normal text written at some predefined position. Or there may exist XML metadata associated with the page's object definition. To tell what we have here, I would need the PDF itself. In any case, the requested information can be extracted with PyMuPDF. – Jorj McKie Apr 18 '23 at 08:54
  • PyPDF2 is deprecated, please use `pypdf`. You might want to give [pdfly](https://pypi.org/project/pdfly/) a try. It is an application based on pypdf. If `pdfly meta example.pdf` oder `pdfly pagemeta example.pdf 0` (0 means "index 0 page") shows the data you're looking for, then it's easy to do with `pypdf`. – Martin Thoma Apr 21 '23 at 18:23
  • It might help if you could share an example PDF + a couple of pages and what you would expect from those pages. – Martin Thoma Apr 21 '23 at 18:25

0 Answers0