Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

  • extracting document information (title, author, ...),
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions
3
votes
0 answers

How to change font in pdf file using pyPDF2 in python

How to change the font in pyPDF 2 module. I tried print(help(canvas.Canvas)) I tried the initialFontName = None and initialFontSize = None but my text didn't change. Also, I'm doing this using raspberry-pi with raspbian operating system. Here's my…
tin tinie
  • 63
  • 1
  • 8
3
votes
1 answer

How to extract only paragraphs from pdf using python or java?

We could able to extract entire text from pdf using pypdf2 and pdfbox but not able to fetch only paragraphs.
Ashok Kuramdasu
  • 313
  • 4
  • 15
3
votes
2 answers

PyPDF2 PdfFileMerger loosing PDF module in merged file

I am merging PDF files with PyPDF2 but, when one of the files contains a PDF Module filled with data (a typical application-filled PDF), in the merged file the module is empty, no data is shown. Here's the two methods I am using to merge the…
A_E
  • 175
  • 11
3
votes
1 answer

Extracting text from formatted PDF using python

I have to parse a formatted pdf to get some feilds. The PDF is here. What I need to parse is shown in this imgur. I have used PyPDF2 to get text, But It returns raw text without any formatting. import PyPDF2 pdfFileObj =…
Ayyan Khan
  • 507
  • 2
  • 12
3
votes
2 answers

Script to loop through and match files based on file name and append

I have a directory with many files that are named like: 1234_part1.pdf 1234.pdf 5432_part1.pdf 5432.pdf 2323_part1.pdf 2323.pdf etc. I am trying to merge the pdf where the the first number part of the file are the same. I have code that can do this…
Lulumocha
  • 143
  • 8
3
votes
1 answer

module 'PyPDF2' has no attribute 'PdfFileReader'

I am following along the book "Automate Boring Stuff with Python", but I receive the an error when trying to run this simple script. import PyPDF2 pdfFileObj = open('meetingminutes.pdf', 'rb') pdfReader = PyPDF2.PdfFileReader(pdfFileObj) The…
BlueGold71
  • 33
  • 1
  • 1
  • 4
3
votes
2 answers

"PDF File has not been decrypted" issue still persists in PyPDF2

Getting the following errors while reading PDF files using PyPDF2 raise utils.PdfReadError("File has not been decrypted") PdfReadError: File has not been decrypted I have been trying to read PDF documents programmatically through python. For most…
venkatttaknev
  • 669
  • 1
  • 7
  • 21
3
votes
4 answers

How do I reverse the order of the pages in a pdf file using pyPdf?

I have a pdf file "myFile.pdf". I would like to reverse the order of its pages using pyPdf. How?
snakile
  • 52,936
  • 62
  • 169
  • 241
3
votes
2 answers

How to search all file types in directory for regular expression

So, I want to search my whole directory for files that contain a list of regular expressions. That includes: directories, pdfs, and csv files. I can succesfully do this task when searching for only text files but search all file types is the…
James Davinport
  • 303
  • 7
  • 19
3
votes
2 answers

Can't get text out of PDF file with PyPDF2

I am trying to get the text from a PDF file I downloaded with PyPDF. Here is my code: if not PyPDF2.PdfFileReader('download.pdf').isEncrypted: PyPDF2.PdfFileReader('download.pdf').getPage(0).extractText() This is the…
3
votes
1 answer

How to convert the extracted text from PDF to JSON or XML format in Python?

I am using PyPDF2 to extract the data from PDF file and then converting into Text format? PDF format for the file is like this: Name : John Address: 123street , USA Phone No: 123456 Gender: Male Name : Jim Address: 456street , USA Phone…
Avi
  • 1,795
  • 3
  • 16
  • 29
3
votes
2 answers

Unable to read pdf file using Pypdf. Its showing output in bytecode

Can anyone help me out? Thanks in Advance. Code : from PyPDF2 import PdfFileReader def text_extractor(path): with open(path, 'rb') as f: pdf = PdfFileReader(f) page = pdf.getPage(2) print(page) text =…
sridhar er
  • 124
  • 7
3
votes
2 answers

How to draw a paragraph from top to bottom on canvas?

I have been trying to create a pdf using PyPDF2 and Reprortlab. I need to draw a flowable paragraph with huge chunk of text. The problem is the size of the paragraph may vary. I want to keep the top-left corner (start of the paragraph) of the…
sajid
  • 807
  • 1
  • 9
  • 23
3
votes
2 answers

PyPDF2, why am I getting an index error? List index out of range

I'm following along in Al Sweigart's book 'Automate the Boring Stuff' and I'm at a loss with an index error I'm getting. I'm working with PyPDF2 tring to open an encrypted PDF document. I know the book is from 2015 so I went to the…
User67
  • 61
  • 1
  • 5
3
votes
1 answer

How extract extract specific text from pdf file - python

I am trying to extract this text: DLA LAND AND MARITIME ACTIVE DEVICES DIVISION PO BOX 3990 COLUMBUS OH 43218-3990 USA Name: Desmond Forshey Buyer Code:PMCMTA9 Tel: 614-692-6154 Fax: 614-692-6930 Email: Desmond.Forshey@dla.mil from this pdf…
jone2
  • 191
  • 1
  • 4
  • 18