Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

extracting document information (title, author, ...),
splitting documents page by page,
merging documents page by page,
cropping pages,
merging multiple pages into a single page,
encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions

votes

1 answer

PyPDF2: TypeError: coercing to Unicode: need string or buffer, PdfFileWriter found

Reworking code to include Context Managers via with statements. However I am receiving a Traceback: using Python 2.7 on Windows Traceback (most recent call last): File "CommissionSecurity.py", line 52, in with open(output, 'w') as…

asked Nov 20 '15 at 18:07

AlliDeacon

1,365
3
21
35

votes

2 answers

Error in the coding of the characters in reading a PDF

I need to read this PDF. I am using the following code: from PyPDF2 import PdfFileReader f = open('myfile.pdf', 'rb') reader = PdfFileReader(f) content = reader.getPage(0).extractText() f.close() content = ' '.join(content.replace('\xa0', '…

python pdf pypdf

asked Nov 12 '15 at 04:52

macabeus

4,156
5
37
66

votes

1 answer

pyPdf extracting info from IndirectObject

I am writing a script that will read the creation and modified dates of pdf files. I am using pyPdf package in Python I have the following code from pyPdf import PdfFileWriter, PdfFileReader input1 =…

python pdf pypdf

asked Sep 30 '15 at 22:56

user4505419

votes

1 answer

Not getting the text from PDF in right format when reading using pyPDF

I was trying to read the PDF document on the following link using the pyPDF package in Python. http://www.hdfcsec.com/Share-Market-Research/Research-Details/StockReports/3011454 I used the following code to read the PDF: ###########Beginning of…

python pdf pypdf

asked Aug 03 '15 at 15:20

Karthik Ganapathy

votes

1 answer

How to reset the output file?

I want to split a long PDF document into many parts, e.g. part 1 comprising pages 3-14, part 2 comprising pages 15-19, part 3 comprising pages 20-27, using PyPDF2. I coded a loop that takes the relevant pages out of the original PDF and saves them…

python pypdf

asked Jul 13 '15 at 10:35

sh_python

votes

1 answer

Start first PDF page a certain distance from top then start at the top on every page after that?

Using XHTMLPDF2 in Python; great tool! I'm generating PDFs to integrate into yet another PDF, so I need the first page to start at a certain height from the top (say 432pt at times, 200pt at others; it's in a variable). Every page after that,…

python css pdf pypdf

asked Jun 11 '15 at 00:35

Miguel Diaz

votes

4 answers

Python - convert pdf to text, encoding error

I tried to convert pdf document to txt file. (example of pdf file link) So I tried like below. But the extracted text is strange like ??챘#?遏?h첨챦_철?‾n?~w??¬?k How can I fix it? #!/usr/bin/python # -*- coding: cp949 -*- # -*- coding: utf-8 -*- # -*-…

python pdf error-handling encoding pypdf

asked Mar 15 '15 at 06:01

user3704652

votes

1 answer

Using PyPDF2 to merge files into multiple output files

Here is the code block that is causing the issue. The loop will append the new file each time, which is not what I am trying to accomplish. For example, outputfile1 is input1.pdf, outputfile2 is input1.pdf + input2.pdf... I am trying to merge…

python python-2.7 pdf pypdf

asked Feb 28 '15 at 14:31

user3482598

votes

1 answer

Converting a PDF file consisting of tables into text document containings tables in Python

I have this pdf file that consists of general tables consisting of names,address,phone number,fax number. I want is : 1) read this file and get the content of each row and put it in data base. i.e get the name from corresponding name column of…

python converters pypdf

asked Feb 25 '15 at 17:57

Kajal Gupta

votes

3 answers

How to split/crop a pdf along the middle using pyPdf

I have a pdf that looks like this and i'd like to crop all the text out, almost right down the middle of the page. I found this script that does something simmilar: def splitHorizontal(): from pyPdf import PdfFileWriter, PdfFileReader input1 =…

python pdf pypdf

asked Dec 06 '14 at 21:12

Jeff

votes

1 answer

pyPDF2 merging error coercing to Unicode

I'm trying to give pypdf some pdfs to merge and it throws a coercing to Unicode error. My code is from PyPDF2 import PdfFileMerger, PdfFileReader import pdfcrowd from django.http import HttpResponse def generate_pdf(request): list_of_pages =…

python django pypdf

asked Nov 22 '14 at 16:39

user3982654

votes

0 answers

crack pdf using python scrypt

i have to write a scrypt (for a university class and i must send to proffesor until sunday) that will crack a pdf file, i have tryied a lot so far but cant make it work my code is: #!/usr/bin/python2.7 from PyPDF2 import PdfFileReader,…

python pdf pypdf

asked Oct 31 '14 at 20:46

dimargy

votes

1 answer

How to use global variables in tkinter and PyPDF2 to merge PDF files

Been using Python for a very short amount of time and can't figure out what is wrong with this code. I can't find any examples that would work for my code, so I'm asking here. import sys import os from PyPDF2 import PdfFileReader, PdfFileMerger,…

python-3.x tkinter pypdf

asked Oct 11 '14 at 01:38

user1113569

3,441
2
14
10

votes

1 answer

PyPDF hangs on big drawing

Here is the PDF I'm trying to parse. My code (below) hangs on the following line: content += " ".join(extract.strip().split()) It hangs on page 21, which is a big drawing. I wouldn't mind just skipping pages like this big drawing, but I'm not…

python parsing pypdf

asked Oct 08 '14 at 19:41

Sid Kwakkel

votes

2 answers

(PyPDF2) Attempt to merge PDFs produces error

I've been trying to add a watermark as shown in Add text to Existing PDF using Python, but I keep getting error regarding the pdf data from reportlab. Is it a problem with the input pdf? Setup: Python 3.3 (Anaconda Distribution), Windows 7 from…

python reportlab pypdf

asked Sep 26 '14 at 15:48

Ben Southgate

3,388
3
22
31

Prev 1 2 3

…

96 97 Next