Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

extracting document information (title, author, ...),
splitting documents page by page,
merging documents page by page,
cropping pages,
merging multiple pages into a single page,
encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions

votes

3 answers

How to count the numer of pdf pages in python that has blank pdf page also

I have tried to print the count of pdf document which includes some blank white pdf page using pypdf module. But it avoids the blanks page and print the count of rest of pages. Below is the code. import sys import pyPdf from pyPdf import…

python-2.7 pypdf

asked May 20 '13 at 10:41

Deepan

votes

1 answer

"import decimal" raises errors

here's the code I am using import os import decimal from pyPdf import PdfFileReader path = r"E:\python\Real Python\Real Python\Course materials\Chapter 8\Practice files" inputFileName = os.path.join(path,"Pride and Prejudice.pdf") inputFile =…

python module decimal pypdf

asked Mar 12 '13 at 06:57

faraz

2,603
12
39
61

votes

0 answers

How to remove bookmark destination document properties

Question is about PDF bookmarks. When bookmarks are created, there is option to assign destination page layout (among other things) which user is encouraged not to set unless there is really reason to do so. From time to time I run to this kind of…

python pdf pypdf

asked Nov 27 '11 at 14:59

theta

24,593
37
119
159

votes

2 answers

PyPDF2 error "PyCryptodome is required for AES algorithm"

I've got hundreds on PDFs I need to set password. I tried to use pyPDF2 to do that but I got an error: "DependencyError: PyCryptodome is required for AES algorithm". I've tried to google any other module like pikepdf but I found only how to crack…

python pypdf pikepdf

asked Sep 13 '22 at 09:59

rammbb

votes

1 answer

--footer not showing when using wkhtmltopdf within a docker container

I am trying to deploy a flask (python) app that uses wkhtmltopdf. Everything works perfectly when it is run in a debug environment, however when I run it using docker, it stops showing footers and headers. I suspect it has something to do with the…

python flask pdf wkhtmltopdf pypdf

asked Nov 28 '21 at 21:53

Folded Panda

votes

1 answer

problem with closing python pypdf - writing. getting a valueError: I/O operation on closed file

can't figure this up this function (part of class for scraping internet site into a pdf) supposed to merge the pdf file generated from web pages using pypdf. this is the method code: def mergePdf(self,mainname,inputlist=0): """merging the pdf…

python pypdf

asked Jul 21 '11 at 09:02

alonisser

11,542
21
85
139

votes

4 answers

cannot import name 'gTTS' from partially initialized module 'gtts'

ImportError: cannot import name 'gTTS' from partially initialized module 'gtts' (most likely due to a circular import) (C:\Users\Gayathri Sai\PycharmProjects\audibook\gtts.py)

python pypdf gtts

asked Mar 22 '21 at 06:13

vcscharan

votes

1 answer

Python how to read a latex generated pdf with equations

Consider the following article https://arxiv.org/pdf/2101.05907.pdf It's a typically formatted academic paper with only two pictures in pdf file. The following code was used to extract the text and equation from the paper #Related code explanation:…

python pdf text latex pypdf

asked Mar 03 '21 at 09:18

ShoutOutAndCalculate

votes

3 answers

Resize pdf pages in Python

I am using python to crop pdf pages. Everything works fine, but how do I change the page size(width)? This is my crop code: input = PdfFileReader(file('my.pdf', 'rb')) p = input.getPage(1) (w, h) = p.mediaBox.upperRight p.mediaBox.upperRight = (w/4,…

python pdf pypdf

asked Jun 30 '11 at 14:42

user319854

3,980
14
42
45

votes

2 answers

Split PDF files in python - ValueError: invalid literal for int() with base 10: '' "

I am trying to split a huge pdf file into several small pdfs usinf pyPdf. I was trying with this oversimplified code: from pyPdf import PdfFileWriter, PdfFileReader inputpdf = PdfFileReader(file("document.pdf", "rb")) for i in…

python pdf pypdf

asked Jun 18 '11 at 04:11

Alejandro

4,945
6
32
30

votes

3 answers

how to open pdf file using pypdf2

I tried to open a pdf file using pypdf in Google Colab using import PyPDF2 as pdf2 with open("sample.pdf", "r+") as f: pdf = pdf2.PdfFileReader(f) but I get following error: UnsupportedOperation: can't do nonzero end-relative seeks Changing the…

python pypdf

asked Jun 10 '20 at 11:00

user13720131

votes

4 answers

How to add watermark in all pages of PDF files with python?

I'm try to adding watermark to every pages of my PDF file.My PDF files have 58 pages but my output file has get only last page in my PDF file. This's my code: from PyPDF2 import PdfFileReader, PdfFileWriter watermark_pdf =…

python pypdf

asked Jun 08 '20 at 11:46

vee

votes

1 answer

UnsupportedOperation: can't do nonzero end-relative seeks : Python - PyPDF2

Can you guys fix the problem? I'm unable to read an arabic PDF file. I don't know what is the issue. Thanks import PyPDF2 def main(): with open("arabic_text.pdf", encoding='utf-8') as pdfFile: pdfRead = PyPDF2.PdfFileReader(pdfFile) …

python pypdf

asked Apr 29 '20 at 21:56

Shrief Nabil

votes

2 answers

Python, pyPdf, Adobe PDF OCR error: unsupported filter /lzwdecode

My stuff: python 2.6 64 bit (with pyPdf-1.13.win32.exe installed). Wing IDE. Windows 7 64 bit. I got the following error: NotImplementedError: unsupported filter /LZWDecode When I ran the following code: from pyPdf import PdfFileWriter,…

python ocr pypdf

asked May 19 '11 at 02:13

PatentDeathSquad

votes

4 answers

How to retrieve ALL pages from PDF as a single string in Python 3 using PyPDF2

In order to get a single string from a multi-paged PDF I'm doing this: import PyPDF2 pdfFileObject = open('sample.pdf', 'rb') pdfReader = PyPDF2.PdfFileReader(pdfFileObject) count = pdfReader.numPages for i in range(count): page =…

python python-3.x pdf pypdf pdf-extraction

asked Feb 13 '20 at 01:03

Gavrk

Prev 1 2 3

…

96 97 Next