Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

  • extracting document information (title, author, ...),
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions
5
votes
3 answers

How to count the numer of pdf pages in python that has blank pdf page also

I have tried to print the count of pdf document which includes some blank white pdf page using pypdf module. But it avoids the blanks page and print the count of rest of pages. Below is the code. import sys import pyPdf from pyPdf import…
Deepan
  • 59
  • 1
  • 2
5
votes
1 answer

"import decimal" raises errors

here's the code I am using import os import decimal from pyPdf import PdfFileReader path = r"E:\python\Real Python\Real Python\Course materials\Chapter 8\Practice files" inputFileName = os.path.join(path,"Pride and Prejudice.pdf") inputFile =…
faraz
  • 2,603
  • 12
  • 39
  • 61
4
votes
0 answers

How to remove bookmark destination document properties

Question is about PDF bookmarks. When bookmarks are created, there is option to assign destination page layout (among other things) which user is encouraged not to set unless there is really reason to do so. From time to time I run to this kind of…
theta
  • 24,593
  • 37
  • 119
  • 159
4
votes
2 answers

PyPDF2 error "PyCryptodome is required for AES algorithm"

I've got hundreds on PDFs I need to set password. I tried to use pyPDF2 to do that but I got an error: "DependencyError: PyCryptodome is required for AES algorithm". I've tried to google any other module like pikepdf but I found only how to crack…
rammbb
  • 41
  • 1
  • 1
  • 4
4
votes
1 answer

--footer not showing when using wkhtmltopdf within a docker container

I am trying to deploy a flask (python) app that uses wkhtmltopdf. Everything works perfectly when it is run in a debug environment, however when I run it using docker, it stops showing footers and headers. I suspect it has something to do with the…
4
votes
1 answer

problem with closing python pypdf - writing. getting a valueError: I/O operation on closed file

can't figure this up this function (part of class for scraping internet site into a pdf) supposed to merge the pdf file generated from web pages using pypdf. this is the method code: def mergePdf(self,mainname,inputlist=0): """merging the pdf…
alonisser
  • 11,542
  • 21
  • 85
  • 139
4
votes
4 answers

cannot import name 'gTTS' from partially initialized module 'gtts'

ImportError: cannot import name 'gTTS' from partially initialized module 'gtts' (most likely due to a circular import) (C:\Users\Gayathri Sai\PycharmProjects\audibook\gtts.py)
vcscharan
  • 41
  • 3
4
votes
1 answer

Python how to read a latex generated pdf with equations

Consider the following article https://arxiv.org/pdf/2101.05907.pdf It's a typically formatted academic paper with only two pictures in pdf file. The following code was used to extract the text and equation from the paper #Related code explanation:…
4
votes
3 answers

Resize pdf pages in Python

I am using python to crop pdf pages. Everything works fine, but how do I change the page size(width)? This is my crop code: input = PdfFileReader(file('my.pdf', 'rb')) p = input.getPage(1) (w, h) = p.mediaBox.upperRight p.mediaBox.upperRight = (w/4,…
user319854
  • 3,980
  • 14
  • 42
  • 45
4
votes
2 answers

Split PDF files in python - ValueError: invalid literal for int() with base 10: '' "

I am trying to split a huge pdf file into several small pdfs usinf pyPdf. I was trying with this oversimplified code: from pyPdf import PdfFileWriter, PdfFileReader inputpdf = PdfFileReader(file("document.pdf", "rb")) for i in…
Alejandro
  • 4,945
  • 6
  • 32
  • 30
4
votes
3 answers

how to open pdf file using pypdf2

I tried to open a pdf file using pypdf in Google Colab using import PyPDF2 as pdf2 with open("sample.pdf", "r+") as f: pdf = pdf2.PdfFileReader(f) but I get following error: UnsupportedOperation: can't do nonzero end-relative seeks Changing the…
user13720131
  • 43
  • 1
  • 1
  • 4
4
votes
4 answers

How to add watermark in all pages of PDF files with python?

I'm try to adding watermark to every pages of my PDF file.My PDF files have 58 pages but my output file has get only last page in my PDF file. This's my code: from PyPDF2 import PdfFileReader, PdfFileWriter watermark_pdf =…
vee
  • 89
  • 2
  • 2
  • 8
4
votes
1 answer

UnsupportedOperation: can't do nonzero end-relative seeks : Python - PyPDF2

Can you guys fix the problem? I'm unable to read an arabic PDF file. I don't know what is the issue. Thanks import PyPDF2 def main(): with open("arabic_text.pdf", encoding='utf-8') as pdfFile: pdfRead = PyPDF2.PdfFileReader(pdfFile) …
Shrief Nabil
  • 59
  • 1
  • 8
4
votes
2 answers

Python, pyPdf, Adobe PDF OCR error: unsupported filter /lzwdecode

My stuff: python 2.6 64 bit (with pyPdf-1.13.win32.exe installed). Wing IDE. Windows 7 64 bit. I got the following error: NotImplementedError: unsupported filter /LZWDecode When I ran the following code: from pyPdf import PdfFileWriter,…
PatentDeathSquad
  • 543
  • 2
  • 7
  • 16
4
votes
4 answers

How to retrieve ALL pages from PDF as a single string in Python 3 using PyPDF2

In order to get a single string from a multi-paged PDF I'm doing this: import PyPDF2 pdfFileObject = open('sample.pdf', 'rb') pdfReader = PyPDF2.PdfFileReader(pdfFileObject) count = pdfReader.numPages for i in range(count): page =…
Gavrk
  • 295
  • 1
  • 4
  • 16