Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

  • extracting document information (title, author, ...),
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions
5
votes
1 answer

How to remove annotations in pdf in Python 3

My original goal was to remove the extensive white margins on my PDF pages. Then I found this purpose can be achieved by scaling the page using the code below, but annotations are not scaled. import PyPDF2 # This works fine with open('old.pdf',…
n33
  • 383
  • 4
  • 13
5
votes
5 answers

Python PDF read straight across as how it looks in the PDF

If I use the code in the answer here: Extracting text from a PDF file using PDFMiner in python? I can get the text to extract when applying to this pdf: https://www.tencent.com/en-us/articles/15000691526464720.pdf However, you see under…
jason
  • 3,811
  • 18
  • 92
  • 147
5
votes
1 answer

Place a vertical or rotated text in a PDF with Python

I'm currently generating a PDF with PyFPDF. I also need to add a vertical/rotated text. Unfortunately, it's not directly supported in PyPDF2 as far as I see. There are solutions for FPDF for PHP. Is there a way to insert vertical or rotated text in…
Michael
  • 7,407
  • 8
  • 41
  • 84
5
votes
2 answers

Merge 2 pdf files giving me an empty pdf

I am using the following standard code: # importing required modules import PyPDF2 def PDFmerge(pdfs, output): # creating pdf file merger object pdfMerger = PyPDF2.PdfFileMerger() # appending pdfs one by one for pdf in pdfs: …
HolyMonk
  • 432
  • 6
  • 17
5
votes
0 answers

Python 3 library to merge any image into PDF

In python 3, I have a list of images of various formats (pdf, png, jpg, gif), and I'm merging them all into one multi-page pdf. Using PyPDF2, PDF files can be merged. But png, jpg, etc, are not supported. This is very well covered here: Merge PDF…
Raf
  • 1,628
  • 3
  • 21
  • 40
5
votes
1 answer

Batch rotate PDF files with PyPDF2

I've been working on a code to batch rotate PDF files inside a folder, but I can't find a way to iterate and change the destination folder of the rotated file. My intention is to save the new file with the same name in another folder. from os import…
fcr
  • 125
  • 1
  • 1
  • 11
5
votes
3 answers

How to get PDF file metadata 'Page Size' using Python?

I try to use PyPDF2 module in Python 3 but I can't display 'Page Size' property. I would like to know what the sheet of paper dimensions were before scanning to PDF file. Something like this: import…
Mirek
  • 63
  • 1
  • 6
5
votes
2 answers

How to erase text from PDF using Python

I'm creating a python script to edit text from PDFs. I have this Python code which allows me to add text into specific positions of a PDF file. import PyPDF2 import io from reportlab.pdfgen import canvas from reportlab.lib.pagesizes import…
Gabriel Belini
  • 760
  • 1
  • 13
  • 32
5
votes
2 answers

PyPDF2 returning blank PDF after copy

def EncryptPDFFiles(password, directory): pdfFiles = [] success = 0 # Get all PDF files from a directory for folderName, subFolders, fileNames in os.walk(directory): for fileName in fileNames: if…
stryker14
  • 53
  • 1
  • 7
5
votes
0 answers

How to append content to a PDF using pypdf and preserve the past versions

PDF supports document versions. That means that the current document can be kept intact, and we can change the content and presentation of the document just adding info. That feature is specially useful to verify the look and integrity of the…
yucer
  • 4,431
  • 3
  • 34
  • 42
5
votes
1 answer

Adding comment to PDFs using Python?

I am facing this task of adding comments to PDFs. Specifically, the task is to add a sticky note box at the beginning of the file and add a few lines of text in the stick note box. I need to do this repetitively for bulk number of PDFs so I am…
Allen Lin
  • 1,179
  • 4
  • 13
  • 23
5
votes
1 answer

Merging two PDFs

import PyPDF2 import glob import os from fpdf import FPDF import shutil class MyPDF(FPDF): # adding a footer, containing the page number def footer (self): self.set_y(-15) self.set_font("Arial", Style="I", size=8) …
Lynob
  • 5,059
  • 15
  • 64
  • 114
5
votes
1 answer

PyPDF2: how to add a footer to a pdf?

In PyPDF2, how to add a footer to every page of a pdf file? Do I have to do something like page5 = reader.pages[4] page5.mediabox.right = page5.mediabox.right / 4 page5.mediabox.top = page5.mediabox.top / 4 writer.add_page(page5) or is there a…
Lynob
  • 5,059
  • 15
  • 64
  • 114
5
votes
1 answer

Adding information to pdf, PyPDF2 merging too slow

I want a text on each page of a pdf. This text is a html code that looks like

blabla

as to appear red on the final doc, I convert it in pdf (html2pdf lib) then I merge it (PyPDF2 lib) to each page of my pdf. ...but the…
J'hack le lezard
  • 413
  • 7
  • 23
5
votes
1 answer

python and pyPdf - how to extract text from the pages so that there are spaces between lines

currently, if I make a page object of a pdf page with pyPdf, and extractText(), what happens is that lines are concatenated together. For example, if line 1 of the page says "hello" and line 2 says "world" the resulting text returned from…
Tony Stark
  • 24,588
  • 41
  • 96
  • 113