Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

  • extracting document information (title, author, ...),
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions
13
votes
1 answer

Adding nested bookmarks to a PDF using PyPDF2

The documentation for PyPDF2 states that it's possible to add nested bookmarks to PDF files, and the code appears (upon reading) to support this. Adding a bookmark to the root tree is easy (see code below), but I can't figure out what I need to pass…
Snorfalorpagus
  • 3,348
  • 2
  • 29
  • 51
13
votes
3 answers

Cannot install PyPdf 2 module

Trying to install PyPdf2 module, I downloaded the zip and unzipped it, I executed python setup.py build and python setup.py install, but it seems that it has not been installed , when I try to import it from a python script, it returns an…
geogeek
  • 1,274
  • 3
  • 25
  • 42
13
votes
7 answers

Whitespace gone from PDF extraction, and strange word interpretation

Using the snippet below, I've attempted to extract the text data from this PDF file. import pyPdf def get_text(path): # Load PDF into pyPDF pdf = pyPdf.PdfFileReader(file(path, "rb")) # Iterate pages content = "" for i in…
Louis Thibault
  • 20,240
  • 25
  • 83
  • 152
12
votes
4 answers

Create outlines/TOC for existing PDF in Python

I'm using pyPdf to merge several PDF files into one. This works great, but I would also need to add a table of contents/outlines/bookmarks to the PDF file that is generated. pyPdf seems to have only read support for outlines. Reportlab would allow…
jphoude
  • 333
  • 1
  • 2
  • 8
12
votes
4 answers

Extract specific pages of PDF and save it with Python

I have some sources and tried to code which extract some pages and create pdf files. I have a list which looks like this information = [(filename1,startpage1,endpage1), (filename2, startpage2, endpage2),…
SSS
  • 621
  • 2
  • 7
  • 25
12
votes
4 answers

PyPDF2 write doesn't work on some PDF files (Python 3.5.1)

First of all I am using Python 3.5.1 (32 bit version) I wrote the following program to add a pagenumber on all pages of my pdf files using PyPDF2 and reportlab: #import modules from os import listdir from PyPDF2 import PdfFileWriter,…
Max Eisert
  • 129
  • 1
  • 1
  • 6
12
votes
4 answers

How to close pyPDF "PdfFileReader" Class file handle

this should be very simple question, for which I couldn't find answer by Google search: How to close file handle opened by pyPDF "PdfFileReader" Class Here is snippet: import os.path from pyPdf import PdfFileReader fname = 'my.pdf' input =…
romor
  • 417
  • 2
  • 6
  • 11
12
votes
3 answers

pyPdf for IndirectObject extraction

Following this example, I can list all elements into a pdf file import pyPdf pdf = pyPdf.PdfFileReader(open("pdffile.pdf")) list(pdf.pages) # Process all the objects. print pdf.resolvedObjects now, I need to extract a non-standard object from the…
JuanDeLosMuertos
  • 4,532
  • 15
  • 55
  • 87
12
votes
1 answer

Change metadata of pdf file with pypdf

I'd like to create/modify the title of a pdf document using pypdf. It seems that the title is readonly. Is there a way to access this metadata r/w? If answer positive, a piece of code would be appreciated. Thanks
Baudouin Tamines
  • 121
  • 1
  • 1
  • 4
11
votes
5 answers

How do you shift all pages of a PDF document right by one inch?

I want to shift all the pages of an existing pdf document right one inch so they can be three hole punched without hitting the content. The pdf documents will be already generated so changing the way they are generated is not possible. It appears…
Joe McGrath
  • 1,481
  • 10
  • 26
10
votes
1 answer

Python script to remove blank pages using pyPDF

I am trying to write a couple of python scripts using pyPDF to split PDF pages into six separate pages, order them correctly (usually printed front and back, so every other page needs to have its subpages ordered differently), and remove resulting…
rpeck1682
  • 172
  • 2
  • 11
10
votes
2 answers

ValueError: seek of closed file Working on PyPDF2 and getting this error

I am trying to get text out of a pdf file. Below is the code: from PyPDF2 import PdfFileReader with open('HTTP_Book.pdf', 'rb') as file: pdf = PdfFileReader(file) page = pdf.getPage(1) #print(dir(page)) print(page.extractText()) This gives me…
Jeet Singh
  • 303
  • 1
  • 2
  • 10
10
votes
1 answer

PyPDF2 extract empty Text

I am using PyPDF2 for extract text from pdf. All examples which I found in the google look like my code: import PyPDF2 reader = PyPDF2.PdfFileReader("test2.pdf") page = reader.getPage(0) text =…
nesalexy
  • 848
  • 2
  • 9
  • 30
10
votes
3 answers

Merge Two PDF by PyPDF2 but got error Unexpected destination '/__WKANCHOR_2'

from PyPDF2 import PdfFileMerger, PdfFileReader filepath_list = ['/tmp/abc.pdf','/tmp/xyz.pdf'] merger = PdfFileMerger() for file_name in filepath_list: with open(file_name, 'rb') as f: merger.append(f) merger.write("result.pdf") While merger…
Deval
  • 126
  • 1
  • 11
10
votes
4 answers

Change metadata of pdf file with pypdf2

I want to add a metadata key-value pair to the metadata of a pdf file. I found a several years old answer, but I think this is way to complicated. I guess there is an easier way today: https://stackoverflow.com/a/3257340/633961 I am not married with…
guettli
  • 25,042
  • 81
  • 346
  • 663
1 2
3
96 97