Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

  • extracting document information (title, author, ...),
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions
10
votes
1 answer

How to extract text from a Specific Area in a PDF using Python?

I'm trying to extract Text from a PDF using Python, and I have successfully done so using PyPDF2 like this: from PyPDF2 import PdfFileReader reader = PdfFileReader('path.pdf') page = reader.getPage(0) page.extractText() This extracts all the Text…
Devdatta Tengshe
  • 4,015
  • 10
  • 46
  • 59
10
votes
1 answer

PyPDF2 nested bookmarks with same name not working

When you try and nest several bookmarks with the same name, PyPDF2 does not take it into account. Below self-contained python code to test what I mean (you need at have 3 pdf files named a, b and c in the working folder to test it out) from PyPDF2…
Chapo
  • 2,563
  • 3
  • 30
  • 60
10
votes
1 answer

Add in-document link to PDF

I need to programmatically analyze and combine several (hundreds) of PDF documents, and link the pages together in specialized ways. Each PDF includes text in each location where a link belongs, indicating what it should link to. I'm using pdfminer…
Henry Keiter
  • 16,863
  • 7
  • 51
  • 80
9
votes
3 answers

Why my code not correctly split every page in a scanned pdf?

Update: Thanks to stardt whose script works! The pdf is a page of another one. I tried the script on the other one, and it also correctly spit each pdf page, but the order of page numbers is sometimes right and sometimes wrong. For example, in page…
Tim
  • 1
  • 141
  • 372
  • 590
9
votes
2 answers

how to insert a string to pdf using pypdf?

sorry,.. i'am a noob in python.. I need to create a pdf file, without using an existing pdf files.. (pure create a new one) i have googling, and lot of them is merge 2 pdf or create a new file copies from a particular page in another file... what…
Egy Mohammad Erdin
  • 3,402
  • 6
  • 33
  • 57
9
votes
4 answers

only algorithm code 1 and 2 are supported

I would like to read the pdf file. This is a book.pdf with a password (256 bit AES encryption). I know a password. I am using Jupyter Notebook. I get an error: import PyPDF2 reader =…
batmanforever
  • 155
  • 1
  • 2
  • 9
9
votes
5 answers

pdf form filled with PyPDF2 does not show in print

I need to fill pdf form in batch, so tried to write a python code to do it for me from a csv file. I used second answer in this question and it fills the forms fine, however when I open the filled forms the answers does not show unless the…
anishtain4
  • 2,342
  • 2
  • 17
  • 21
9
votes
2 answers

Add a bookmark to a PDF with PyPDF2

I'm trying to add a bookmark to a PDF using PyPDF2. I run the following with no problems. But a bookmark is never created. Any thoughts on what I'm doing wrong. The PDF is 2 pages long. from PyPDF2 import PdfFileReader, PdfFileWriter reader =…
rmp2150
  • 777
  • 1
  • 11
  • 22
9
votes
1 answer

PdfReadWarning: PdfFileReader stream/file object is not in binary mode

I have many pdf pages that I want to merge them into one file. My script is as follow: from PyPDF2 import PdfFileMerger,PdfFileReader filename_list=[] merger = PdfFileMerger() for i in range (0,66): filename='page'+str(i)+'.pdf' if not…
mikayla
  • 127
  • 1
  • 2
  • 7
9
votes
3 answers

Extract Text Using PdfMiner and PyPDF2 Merges columns

I am trying to parse the pdf file text using pdfMiner, but the extracted text gets merged. I am using the pdf file from the following link [edit: link was broken / pointed to potential malware] I am good with any type of output (file/string). Here…
user2151334
  • 101
  • 1
  • 1
  • 3
9
votes
1 answer

PDF bleed detection

I'm currently writing a little tool (Python + pyPdf) to test PDFs for printer conformity. Alas I already get confused at the first task: Detecting if the PDF has at least 3mm 'bleed' (border around the pages where nothing is printed). I already got…
phryk
  • 105
  • 1
  • 4
8
votes
4 answers

How to get bookmark's page number

from typing import List from PyPDF2 import PdfFileReader from PyPDF2.generic import Destination def get_outlines(pdf_filepath: str) -> List[Destination]: """Get the bookmarks of a PDF file.""" with open(pdf_filepath, "rb") as fp: …
theta
  • 24,593
  • 37
  • 119
  • 159
8
votes
2 answers

Python - ReportLab and PyPDF edit PDF Issue

I am trying to edit an existing pdf file using PyPDF and ReportLab. When I try to position the red circle and red text, it appears to be hiding behind a white container or something. If I position it anywhere else, it works fine. What is causing…
zachjohn987
  • 103
  • 7
8
votes
3 answers

Select only first page of PDF with PyPDF2

I am trying to strip out only the first page of multiple PDF files and combine into one file. (I receive 150 PDF files a day, the first page is the invoice which I need, the following three to 12 pages are just backup which I do not need) So the…
mike horan
  • 81
  • 1
  • 2
8
votes
3 answers

Is there a way to close the file PdfFileReader opens?

I'm opening a lot of PDF's and I want to delete the PDF's after they have been parsed, but the files remain open until the program is done running. How do I close the PDf's I open using PyPDF2? Code: def getPDFContent(path): content = "" #…
SPYBUG96
  • 1,089
  • 5
  • 20
  • 38