Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

  • extracting document information (title, author, ...),
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions
2
votes
1 answer

Changes made by PyPDF2 to a pdf form don't show up

I have encountered a problem whilst trying to automate filling out pdfs. I'm using the python library pypdf2. Im using python version 3.10.6. In vscode I have an extension that allows me to view pdfs and there it shows me the changes that I made,…
Kolliden
  • 33
  • 7
2
votes
0 answers

How to extract title of each page from the PDF using Python

I want to extract the title of each page of PDF, but my pdfs does not have similar or predefine size of title (title size is varying in every page), I tried following code, but its not giving me the expected output, instead its extracting whole text…
2
votes
3 answers

PyPDF2 writer function creates blank page

Trying to write a function to combine pages in a PDF document. Streaming the output creates a blank page for an unknown reason Here is the test case from PyPDF2 import PdfReader, PdfWriter dr = r"C:\GC" ldr = dr + r"\12L.pdf" writer =…
Simon Nasser
  • 112
  • 12
2
votes
1 answer

Change order of pdf bookmarks using PyPdf2

I create an application, that merges multiple pdfs with a bookmark. If the origin pdfs already have bookmarks, I want to keep them and just add a bookmark at the beginning of the pdf. I use the following code: The title and the path in the code come…
Mazze
  • 383
  • 3
  • 13
2
votes
1 answer

Fill PDF form values using PyPDF2 multiple pages but getting same and duplicate data on all pages of pdf

I have used this below code. from PyPDF2 import PdfFileWriter, PdfFileReader from PyPDF2.generic import BooleanObject, NameObject, IndirectObject def set_need_appearances_writer(writer: PdfFileWriter): # See 12.7.2 and 7.7.2 for more…
varun
  • 27
  • 7
2
votes
2 answers

Creating an AWS lambda function to split pdf files in a s3 bucket

I want to write an AWS Lambda function that: Takes pdf file from s3 bucket -> splits the pdf file -> Stores split files to S3 bucket. I am using PyPDF module, so need to know how I can use it in aws lambda function as well. The code to split pdf…
2
votes
0 answers

Merge two pages in pdf to single pdf one page

I have the following code that merges a pdf file with two pages to a single pdf with one page only Page two goes below page one in portrait form and everything till now is OK. Is it possible to add padding white spaces around each page before…
YasserKhalil
  • 9,138
  • 7
  • 36
  • 95
2
votes
1 answer

Extracting information from multiple resumes all in PDF format

I have a data set with a column which has google drive link for resumes, I have 5000 rows so there are 5000 links , I am trying to extract information like years of experience and salary from these resumes in 2 separate columns. so far I've seen so…
SQL_New_bee
  • 125
  • 1
  • 1
  • 8
2
votes
1 answer

Text rotated when merging pdf pages using Pypdf2 and Reportlab

I'm trying to merge two pages one from reportlab that has the text I wish and another one is my source pdf But when I merge those two pages, my text is rotated 90 degree Pdf created using Report lab -> Overlay Created using Reportlab when Merged…
ConMan77
  • 55
  • 6
2
votes
1 answer

Extract PDF Pages based on Header text in Python

I have an annual report pdf of 'Asian Paints Ltd'. I want to extract the 'Consolidated Balance Sheet Page' (which is the 216th page in the PDF). I've used PyPDF and created a function that extracts all the text, searches for a key term 'Consolidated…
Vansh
  • 43
  • 1
  • 4
2
votes
0 answers

PYPDF Arabic Text

I am using the PYPDF2 python library to open an already existing PDF and modify a value such as {{name}} but I am facing two problems: The words don't seem to be represented correctly (example line: [(But {{name}} did no)-0.6 (t c)1.6 (ome!)]TJ).…
Sara Kat
  • 378
  • 2
  • 19
2
votes
1 answer

how to split pdf file into multiple pdf files by specific word?

I have one pdf file. I want to split that file into multiple pdf files by some specific word from that file. how can i do that in python ?
Avadhesh
  • 4,519
  • 5
  • 33
  • 44
2
votes
2 answers

Extract text and tables of a PDF file in Python

I am looking for a solution to extract both text and tables out of a PDF file. While some packages are good for extracting text, they are not enough good to extract tables. One solution would be using Azure Form Recognizer Layout Model, but it…
Sam S.
  • 627
  • 1
  • 7
  • 23
2
votes
1 answer

PyPDF2 decoding issue when adding annotations in Chinese characters with addJS

I want to use PyPDF2 to add annotations programmatically with the use of addJS, it works very well for Latins characters but not for Chinese character, tried to encode with UTF-8 but seems not work either. Here are the code: from PyPDF2 import…
Stanley
  • 53
  • 1
  • 5
2
votes
1 answer

How to draw a shape inside a pdf with python?

I would like to draw a shape such as a rectangle inside pdf. I have tried with the below code, but it is adding a text in pdf. How can I draw it? # Add text to Existing PDF using Python from pyPdf import PdfFileWriter, PdfFileReader import…
BSFU
  • 59
  • 1
  • 7