Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

  • extracting document information (title, author, ...),
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions
4
votes
2 answers

Extract an image from a PDF in python

I'm trying to extract images from a pdf using PyPDF2, but when my code gets it, the image is very different from what it should actually look like, look at the example below: But this is how it should really look like: Here's the pdf I'm…
Abdel Hana
  • 81
  • 2
  • 8
4
votes
1 answer

Highlight text content in pdf files using python and save a screenshot

I have a list of pdf files and I need to highlight specific text on each page of these files and save a snapshot for each of the text instances. So far I am able to highlight the text and save the entire page of a pdf file as a snapshot. But, I want…
Godfrey
  • 87
  • 1
  • 8
4
votes
1 answer

Python Django PDF Flattening of Form Fields

I have a project where I need to fill out pre-made PDFs and the most logical solution that comes to mind to accomplish this is to make the pre-made PDFs into PDF forms so there are tags where input values are supposed to go, then I can look through…
ViaTech
  • 2,143
  • 1
  • 16
  • 51
4
votes
1 answer

How to render a PyPDF2.PageObject page to a PIL image in python?

Can you help me to render a PDF page opened using PyPDF2 into a PIL image in Python 3?
tairqammar
  • 151
  • 3
  • 10
4
votes
2 answers

Excluding the Header and Footer Contents of a page of a PDF file while extracting text?

Is it possible to exclude the contents of footers and headers of a page from a pdf file during extracting the text from it. As these contents are least important and almost redundant. Note: For extracting the text from the .pdf file, I am using the…
M S
  • 894
  • 1
  • 13
  • 41
4
votes
1 answer

Merging PDF files using Python and PyPDF2 throws a TypeError

I am using Python 3.6.5 to merge PDFs together but am running into a problem. The code below throws a 'TypeError: 'NumberObject' object is not subscriptable' error. What am I doing wrong? When I comment out the line with the merger.append, it…
krazyboi
  • 77
  • 2
  • 12
4
votes
3 answers

Error while image extraction from PDF in python

I am trying to extract all formats of images from pdf. I did some googling and found this page on StackOverflow. I tried this code but I am getting this error: I am using python 3.x and here is the code I am using. I tried to go through comments…
john
  • 85
  • 2
  • 10
4
votes
1 answer

Writing text over a PDF in python3

I am trying to write some string to a PDF file at some position. I found a way to do this and implemented it like this: from PyPDF2 import PdfFileWriter, PdfFileReader import io from reportlab.pdfgen import canvas from reportlab.lib.pagesizes import…
waqasgard
  • 801
  • 7
  • 25
4
votes
1 answer

How to digitally sign PDF documents using Python with an etoken (pen drive)?

How to digitally sign PDF documents using Python? I have an etoken (in pen drive). Additionally, I have created an excel file using openpyxl and converted it into PDF. Now there is a requirement that I need to add digital signature to that PDF…
hsvvijay
  • 41
  • 1
  • 1
  • 3
4
votes
2 answers

Convert PDF page to image with PyPDF2 and BytesIO

I have a function that gets a page from a PDF file via PyPDF2 and should convert the first page to a png (or jpg) with Pillow (PIL Fork) from PyPDF2 import PdfFileWriter, PdfFileReader import os from PIL import Image import io # Open PDF Source…
PrimuS
  • 2,505
  • 6
  • 33
  • 66
4
votes
2 answers

Python - Split pdf by pages

I am using PyPdf2 to split large PDF to pages. The problem is that this process is very slow. This is the code i use: import os from PyPDF2 import PdfFileWriter, PdfFileReader with open(input_pdf_path, "rb") as input_file: input_pdf =…
Montoya
  • 2,819
  • 3
  • 37
  • 65
4
votes
2 answers

Python 3 parse PDF from web

I was trying to get a PDF from a webpage, parse it and print the result to the screen using PyPDF2. I got it working without issues with the following code: with open("foo.pdf", "wb") as f: f.write(requests.get(buildurl(jornal, date,…
Bernardo Meurer
  • 2,295
  • 5
  • 31
  • 52
4
votes
3 answers

How to detect a rotated page in a PDF document in Python?

Given a PDF document with multiple pages, how to check if a given page is rotated (-90, 90 or 180º)? Preferable using Python (pdfminer, pyPDF) ... UPDATE: The pages are scanned, and most of the page is composed by text.
Dayvid Oliveira
  • 1,157
  • 2
  • 14
  • 34
4
votes
2 answers

How to add page number to a pdf file?

I've been trying all morning to add page numbers to a pdf document, but I can't figure it out. I'd like to use python, with pyPdf or reportlab. Does anyone have any ideas?
danje
  • 71
  • 1
  • 6
4
votes
3 answers

Does PyPDF2 take any safety measures when opening an unsafe file?

I'm wanting to use PyPDF2 (source, docs), but first wanted to make sure that it would be safe to use. I'm unable to find anything in it's docs. I want to use it to make sure that uploaded files are valid PDFs. Users are validated, but I'm concerned…
Taylor Hobbs
  • 303
  • 3
  • 13