Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

extracting document information (title, author, ...),
splitting documents page by page,
merging documents page by page,
cropping pages,
merging multiple pages into a single page,
encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions

votes

1 answer

Adding nested bookmarks to a PDF using PyPDF2

The documentation for PyPDF2 states that it's possible to add nested bookmarks to PDF files, and the code appears (upon reading) to support this. Adding a bookmark to the root tree is easy (see code below), but I can't figure out what I need to pass…

python pdf pypdf

asked Sep 17 '13 at 17:12

Snorfalorpagus

3,348
2
29
51

votes

3 answers

Cannot install PyPdf 2 module

Trying to install PyPdf2 module, I downloaded the zip and unzipped it, I executed python setup.py build and python setup.py install, but it seems that it has not been installed , when I try to import it from a python script, it returns an…

python module importerror pypdf

asked Oct 08 '12 at 11:19

geogeek

1,274
3
25
42

votes

7 answers

Whitespace gone from PDF extraction, and strange word interpretation

Using the snippet below, I've attempted to extract the text data from this PDF file. import pyPdf def get_text(path): # Load PDF into pyPDF pdf = pyPdf.PdfFileReader(file(path, "rb")) # Iterate pages content = "" for i in…

python pdf unicode pypdf

asked Jun 18 '12 at 17:16

Louis Thibault

20,240
25
83
152

votes

4 answers

Create outlines/TOC for existing PDF in Python

I'm using pyPdf to merge several PDF files into one. This works great, but I would also need to add a table of contents/outlines/bookmarks to the PDF file that is generated. pyPdf seems to have only read support for outlines. Reportlab would allow…

python pdf reportlab pypdf

asked May 27 '11 at 20:38

jphoude

votes

4 answers

Extract specific pages of PDF and save it with Python

I have some sources and tried to code which extract some pages and create pdf files. I have a list which looks like this information = [(filename1,startpage1,endpage1), (filename2, startpage2, endpage2),…

python pdf extract pypdf

asked Jul 28 '18 at 03:25

SSS

votes

4 answers

PyPDF2 write doesn't work on some PDF files (Python 3.5.1)

First of all I am using Python 3.5.1 (32 bit version) I wrote the following program to add a pagenumber on all pages of my pdf files using PyPDF2 and reportlab: #import modules from os import listdir from PyPDF2 import PdfFileWriter,…

python python-3.x pdf reportlab pypdf

asked Aug 31 '17 at 09:34

Max Eisert

votes

4 answers

How to close pyPDF "PdfFileReader" Class file handle

this should be very simple question, for which I couldn't find answer by Google search: How to close file handle opened by pyPDF "PdfFileReader" Class Here is snippet: import os.path from pyPdf import PdfFileReader fname = 'my.pdf' input =…

python pypdf

asked Dec 12 '10 at 15:09

romor

votes

3 answers

pyPdf for IndirectObject extraction

Following this example, I can list all elements into a pdf file import pyPdf pdf = pyPdf.PdfFileReader(open("pdffile.pdf")) list(pdf.pages) # Process all the objects. print pdf.resolvedObjects now, I need to extract a non-standard object from the…

python pdf stream pypdf

asked Jan 12 '09 at 18:31

JuanDeLosMuertos

4,532
15
55
87

votes

1 answer

Change metadata of pdf file with pypdf

I'd like to create/modify the title of a pdf document using pypdf. It seems that the title is readonly. Is there a way to access this metadata r/w? If answer positive, a piece of code would be appreciated. Thanks

pdf metadata pypdf

asked Apr 04 '10 at 14:19

Baudouin Tamines

votes

5 answers

How do you shift all pages of a PDF document right by one inch?

I want to shift all the pages of an existing pdf document right one inch so they can be three hole punched without hitting the content. The pdf documents will be already generated so changing the way they are generated is not possible. It appears…

c++ python linux pdf pypdf

asked Nov 01 '11 at 22:40

Joe McGrath

1,481
10
26

votes

1 answer

Python script to remove blank pages using pyPDF

I am trying to write a couple of python scripts using pyPDF to split PDF pages into six separate pages, order them correctly (usually printed front and back, so every other page needs to have its subpages ordered differently), and remove resulting…

python pdf crop pypdf

asked Jun 10 '11 at 17:53

rpeck1682

votes

2 answers

ValueError: seek of closed file Working on PyPDF2 and getting this error

I am trying to get text out of a pdf file. Below is the code: from PyPDF2 import PdfFileReader with open('HTTP_Book.pdf', 'rb') as file: pdf = PdfFileReader(file) page = pdf.getPage(1) #print(dir(page)) print(page.extractText()) This gives me…

python python-3.x pypdf

asked May 05 '19 at 11:21

Jeet Singh

votes

1 answer

PyPDF2 extract empty Text

I am using PyPDF2 for extract text from pdf. All examples which I found in the google look like my code: import PyPDF2 reader = PyPDF2.PdfFileReader("test2.pdf") page = reader.getPage(0) text =…

python pypdf

asked Apr 10 '19 at 08:48

nesalexy

votes

3 answers

Merge Two PDF by PyPDF2 but got error Unexpected destination '/__WKANCHOR_2'

from PyPDF2 import PdfFileMerger, PdfFileReader filepath_list = ['/tmp/abc.pdf','/tmp/xyz.pdf'] merger = PdfFileMerger() for file_name in filepath_list: with open(file_name, 'rb') as f: merger.append(f) merger.write("result.pdf") While merger…

python odoo odoo-9 pypdf

asked May 05 '18 at 12:27

Deval

votes

4 answers

Change metadata of pdf file with pypdf2

I want to add a metadata key-value pair to the metadata of a pdf file. I found a several years old answer, but I think this is way to complicated. I guess there is an easier way today: https://stackoverflow.com/a/3257340/633961 I am not married with…

python pdf pypdf pdf-manipulation

asked Oct 20 '17 at 13:06

guettli

25,042
81
346
663

Prev 1 2

…

96 97 Next