Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

extracting document information (title, author, ...),
splitting documents page by page,
merging documents page by page,
cropping pages,
merging multiple pages into a single page,
encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions

votes

1 answer

reading/writing xmp metadatas on pdf files through pypdf

I can read xmp metadatas through pypdf with this code: from pypdf import PdfReader a = PdfReader("file.pdf") b = a.xmp_metadata c = b.pdf_keywords But is this the best way? And if I don't use the pdf_keywords property? Is there any way to set…

asked Jan 21 '09 at 19:43

JuanDeLosMuertos

4,532
15
55
87

votes

3 answers

PyPDF2 split pdf by pages

I wanna split pdf file using PyPDF2. All examples in net is too difficult or don't work or always give error "AttributeError: 'PdfFileWriter' object has no attribute 'stream'" Can someone help with it ? Need separete one pdf with 3 pages into three…

python pypdf

asked Jul 17 '17 at 12:21

Acamori

votes

7 answers

Reading pdf files line by line using python

I used the following code to read the pdf file, but it does not read it. What could possibly be the reason? from PyPDF2 import PdfFileReader reader = PdfFileReader("example.pdf") contents =…

python pypdf

asked Jul 08 '17 at 04:10

Rahul Pipalia

votes

3 answers

PdfFileReader: PdfReadError: Could not find xref table at specified location

I am trying to read Pdf file in python through: from PyPDF2 import PdfFileReader, PdfFileWriter test_reader = PdfFileReader(file("test.pdf", "rb")) Above Line throws error: PyPDF2.utils.PdfReadError: Could not find xref table at specified…

python pypdf

asked Dec 05 '15 at 12:20

Nitin Bhojwani

votes

2 answers

how to iterate over all the objects in a PDF page and check which ones are text objects?

I want to iterate over all the objects in a page of a pdf using pypdf. I also want to check that what is the type of the object, whether it is text or graphics. A code snippet would be a great help. Thanks a lot

python pypdf

asked Oct 20 '12 at 08:15

Shan

18,563
39
97
132

votes

1 answer

pyPdf error invalid argument

I'm actually using pyPdf to open, read and write the content of a PDF file. for that I use these lines of code : from pyPdf import PdfFileWriter, PdfFileReader pdf = PdfFileReader(file("/myPdfFile.pdf", "w+b")) content =…

python file pdf pypdf invalid-argument

asked May 22 '12 at 17:09

kschaeffler

4,083
7
33
41

votes

1 answer

PyPDF Merge and Write issue

I am getting an unexpected error when using this. The first section is from a script that I found online, and I am trying to use it to pull a particular section identified in the PDF's outline. Everything works fine, except right at…

python pdf merge pypdf

asked Sep 29 '11 at 19:50

user971847

votes

2 answers

pyPDF merging and displaying as httpresponse through django

I'm having trouble incorporating pyPDF logic to merge two pdf files into my django site. I have written code that works to merge files when run in a python file on the local server(but I need to explicitly identify which files to merge: from pyPdf…

django pdf django-admin pypdf

asked Aug 19 '11 at 15:34

Joseph

votes

0 answers

Python: Reading PDF with PyPDF2 results in Superfluous whitespace error

I've been struggling with reading a text from a PDF in Python. What I need is PyPDF2 to find a given string and return a reference number placed next to that string. That's the code I'm trying: import os import shutil import PyPDF2 from PyPDF2…

python whitespace pypdf

asked May 13 '21 at 11:40

darkspeed

votes

1 answer

error: Unable to find trailer dictionary while recovering damaged file

PyPDF2 fail sometimes with decryption of some PDF files, and I am trying do decrypt them with pikepdf but I am getting this error: Unable to find trailer dictionary while recovering damaged file Any ideas?

python django pypdf pikepdf

asked May 10 '20 at 22:19

Yordan

votes

1 answer

Change font type/size using PDF annotations

I'm writing data to a PDF with named fields and then changing the attributes of those fields to make them readonly. This is great, but I'd like to be able to manipulate the text as well, change the font size, maybe even the font itself. According to…

python pypdf

asked Mar 18 '20 at 20:20

markwalker_

12,078
7
62
99

votes

5 answers

How to merge two landscape pdf pages using pyPdf

I'm having trouble merging two PDF files with pyPdf. When I run the following code the the watermark (page1) looks fine, but the page2 has been rotated 90 degrees clockwise. Any ideas what's going on? from pyPdf import PdfFileWriter,…

python pdf-generation landscape pypdf

asked May 18 '11 at 07:26

Humphrey

4,108
2
28
27

votes

1 answer

Correcting PDF pages with wrong orientation information with PyPDF2

I'm trying to merge a number of PDF documents in one. However, the documents have different sources, some of them being created in the computer, some of them scanned with different scanners / softwares. I'm scaling them all to A4 size before joining…

python pypdf

asked Aug 01 '19 at 15:51

Gustavo Seabra

votes

2 answers

How to stitch two pdf pages into one in python

I am using python, and I want to combine two PDF pages into a single page. My purpose is to combine these two pages into one, not two PDFs. Is there any way to combine the two PDFs one by one? I don't want to merge these two. Without overlapping,…

python pdf pypdf

asked Apr 13 '19 at 00:07

tins johny

votes

1 answer

convert from pdf to text: lines and words are broken

I want to convert a pdf file to text by PyPDF2 but converted text looks differents from PDF file. Specifically, one line in PDF is broken into multiple lines in text and words may be broken as well. Attached is the PDF and the text file I got with…

python python-3.x pypdf

asked Mar 18 '19 at 11:36

hongftu

Prev 1 2 3

…

96 97 Next