Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

extracting document information (title, author, ...),
splitting documents page by page,
merging documents page by page,
cropping pages,
merging multiple pages into a single page,
encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions

-1

votes

1 answer

IndexError: list index out of range in pypdf2 extract_text in specific pdf file

I have tried: from PyPDF2 import PdfReader input_pdf = PdfReader(open("pdfFile.pdf", "rb")) thispage = input_pdf.pages[0] print(thispage.extract_text()) And I got the following error: Traceback (most recent call last): File…

asked Feb 24 '23 at 17:12

Tomás Gomez Pizarro

-1

votes

2 answers

Check if two sentences contain any matching word using Python

I'm trying to simply check whether two sentences have any similar words. Here's an example: string_one = "Author: James Oliver" string_two = "James Oliver has written this beautiful article which says...." In this case, these two sentences match…

python pypdf

asked Feb 21 '23 at 07:12

saran3h

12,353
4
42
54

-1

votes

2 answers

pypdf gives output with incorrect PDF format

I am using the following code to resize pages in a PDF: from pypdf import PdfReader, PdfWriter, Transformation, PageObject, PaperSize from pypdf.generic import RectangleObject reader = PdfReader("input.pdf") writer = PdfWriter() for page in…

python pypdf

asked Feb 02 '23 at 00:33

Zain Khaishagi

-1

votes

1 answer

Python PDFMerger Too Slow

I am using PDFMerger from PyPDF2. My program is basically reading all PDFs in a folder and merges them into a single one. I have made a test with 15 PDF files each is 99kb and it worked like a charm. Whole process was finished within a second.…

python pypdf

asked Dec 06 '22 at 12:48

seneill

-1

votes

1 answer

Split pfd based off value and Merge dictionaries inside list in python

I want to split a pdf based off a value on every page. Every value should be in its own pdf file. I currently have the following list where all values with the pages are displayed: l = [ {'abr': '123 ', 'page': 1}, {'abr': '125 ', 'page':…

python list dictionary pdf pypdf

asked Aug 30 '22 at 09:31

Der Korrigierer

-1

votes

2 answers

Having trouble getting all the page numbers from a pdf file to output

I'm having trouble getting all the page numbers from a pdf file. this is my code! I just get a one-page number that outputs I'm trying to get all the page numbers from my pdf file. How would I fix my code to get all the pdf page numbers? In total…

python pypdf

asked Jul 14 '22 at 18:43

George

-1

votes

1 answer

How can I merge mutiple pdf-files to one?

from tkinter import filedialog as fd import tkinter as tk from PyPDF2 import PdfFileReader, PdfFileWriter, PdfFileMerger import os mother = tk.Tk() base_pdf = fd.askopenfilename(filetypes=[('PDF files', '.pdf')], title='Wählen Sie bitte die…

python pdf pypdf

asked Jun 28 '22 at 14:46

POPZMOKE

-1

votes

2 answers

Getting none from fields while parsing a pdf file

I am trying to parse a pdf file. I want to get all the values in a list or dictionary of the checkbox values. But I am getting this error. "return OrderedDict((k, v.get('/V', '')) for k, v in fields.items()) AttributeError: 'NoneType' object has no…

python python-3.x pypdf

asked Apr 18 '22 at 20:02

saxope

-1

votes

1 answer

Extract Text from PDF using Python

Hi I am a python beginner. I am trying to extract text from only few boxes in a pdf file PDF File Link I used pytesseract library to extract the text but it is downloading all the text. I want to limit my text extraction to certain observations in…

python pdf python-tesseract text-extraction pypdf

asked Feb 15 '22 at 08:33

Manish Tripathi

-1

votes

1 answer

PyPDF2 find coordinates of Objects

is ther anyway i can find Coordinates in Python from Objects of the PDF. I want then to Cut the PDf exact above the highest Object and below the lowest Object: from PyPDF2 import PdfFileWriter, PdfFileReader with open("in.pdf", "rb") as…

python pypdf

asked Nov 30 '21 at 08:18

Jayklops

-1

votes

2 answers

PDF Parsing a sentence across multiple Lines

Goal: if pdf line contains sub-string, then copy entire sentence (across multiple lines). I am able to print() the line the phrase appears in. Now, once I find this line, I want to go back iterations, until I find a sentence terminator: . ! ?, from…

python pypdf pdfplumber recursionerror

asked Nov 29 '21 at 10:46

StressedBoi69420

1,376
1
12
40

-1

votes

1 answer

Loop through folder and subfolders and merge pdf

I tried to create a script to loop through parent folder and subfolders and merge all of the pdfs into one. Below if the code I wrote so far, but I don't know how to combine them into one script. Reference: Merge PDF files The first function is to…

python pdf pypdf

asked Jul 22 '21 at 02:59

Brian C.

-1

votes

1 answer

I cannot find a way to extract underlined text, cant it be done with pdfminer.six?

I am trying to extract a text in pdf which is underlined using python but not able to find a correct solution can anyone help on this, please

python pdf pypdf pdfminer pdfplumber

asked Jul 16 '21 at 12:04

ram gengadar

-1

votes

1 answer

Split PDF into 10 page sets (python)

I need to split a roughly 380 page pdf file into sets of 10 pages using python. My initial thoughts are to use PyPDF2 but I have no experience with it. I do need a mechanism to ensure the final PDF is saved despite it being under 10 pages. (eg. 383…

python pdf pypdf

asked Jun 02 '21 at 15:18

Sam Oberly

-1

votes

1 answer

Module not found when I tried to import pyPDF2

My python version is 3.6. I am able to install the pyPDF2. Ran pip install pyPDF2 successfully. Ran pip list, it shows up as 1.26.0 My environment is not base, but I set up an environment as pytorch. pyPDF2 is installed successfully in this…

python-3.x pypdf

asked Feb 14 '21 at 20:03

Meng Ge

Prev 1 2 3

…

96 97 Next