1

I need to extract the PDF version from a PDF document. I tried PDF miner but it provides the below info only:

  1. PDF Producer
  2. Created
  3. Modified
  4. Application

Below is the code I tried:

from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument

fp = open("ibs.servlets.pdf", 'rb')
parser = PDFParser(fp)
doc = PDFDocument(parser)
parser.set_document(doc)
if len(doc.info) > 0:
   info = doc.info[0]
   print(info)

Is there any other libraries apart from pdf miner I can use?

Sriram
  • 155
  • 12

1 Answers1

2

The PDF version is stored as a comment in the first line of the PDF file. I couldn't find how to get this information using pdfparser, but using PyPDF2 I could retrieve this information manually:

from PyPDF2.pdf import PdfFileReader
doc = PdfFileReader('ibs.servlets.pdf')
doc.stream.seek(0) # Necessary since the comment is ignored for the PDF analysis
print(doc.stream.readline().decode())

Output:

%PDF-1.5

Frodon
  • 3,684
  • 1
  • 16
  • 33