Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

  • extracting document information (title, author, ...),
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions
-1
votes
1 answer

How can I interchangeably use glob.glob("*PDF) and os.listdr("./directory")?

I am trying to merge PDF files inside a folder I tried running the code from the same directory and it worked however when I copied the code to a different location and specified the directory path of PDF files, the merging process is not happening…
sebastian
  • 3
  • 4
-1
votes
1 answer

Challanges with Pdf/a file for extraction using Python

We have some PDF/A files for extraction and when we try to use standard pdf extraction libraries, nothing is returned from program for entire page. same program is working perfectly fine for standard pdfs and retuning values. Can anyone help how to…
Denish
  • 983
  • 1
  • 13
  • 20
-1
votes
1 answer

Split image/pdf based on specific text with Python

I want to split a pdf (or image if needed) based on text in it. I want to split it to get each question with its options in the pdf/image, separately like a screenshot of just that question and its options. Sample PDF…
-1
votes
1 answer

AttributeError: '_io.BufferedReader' object has no attribute 'page

`I am trying to extract text from pdf file which consists of text, tables, and images. and want to save that file on local system. This was the code i was developing. from PyPDF2 import PdfFileReader # Load the pdf to the PdfFileReader object with…
netha
  • 1
  • 1
-1
votes
1 answer

How do I fix this error when installing pyPDF2 in Python

I receive the following error when trying to install pyPDF2 using following text at the command prompt: python -m pip install pyPDF2 Any suggestions to resolve? Error result: Microsoft Windows [Version 10.0.19042.572] (c) 2020 Microsoft Corporation.…
vitaminC
  • 11
  • 1
-1
votes
1 answer

I have converted a pdf file to csv using anaconda python3 But the converted csv file is not in a readable form how to make it readable?

# importing required modules import PyPDF2 # creating a pdf file object pdfFileObj = open(path, 'rb') # creating a pdf reader object pdfReader = PyPDF2.PdfFileReader(pdfFileObj) # printing number of pages in pdf file…
Jawahar
  • 1
  • 3
-1
votes
1 answer

Reading from pdf file to text yields no results

So I'm trying something very simple: I just want to read text from a pdf file in to a variable - that's it. This is what I'm getting: Does anyone know a reliable way to just read pdf in to a text file?
-1
votes
1 answer

How to flip a pdf page upside down using python?

I'm trying to flip pdf pages upside down using python. I have tried multiple libraries like PyPdf2, PyMuPDF and pdfminer. There is documentation on how to rotate a page, but that is not what I'm looking for. The closest solution I found was on one…
Ajay Alex
  • 21
  • 3
-1
votes
1 answer

What is wrong with this PDF when trying to get a word count

I am trying to write a python app to give me a word count for PDFs. I've run into something odd with this PDF though. When I extract the text from the PDF, it shows up as some sort of binary/symbol garbage. I have tried PyPDF2 and PyMuPDF libs with…
tynick
  • 33
  • 6
-1
votes
1 answer

How to split a PDF every 4 pages using PyPDF2 in python?

Found a sample code online that splits a pdf into 2 pages but couldn't figure to change it to 4 pages, any tips will be appreciated #!/usr/bin/env python3 from PyPDF2 import PdfFileWriter, PdfFileReader import glob, sys pdfs =…
FAN360
  • 1
  • 2
-1
votes
1 answer

How to convert output into a pdf file

Say if I have some functions, in this case below a function which calculates the mode and another function to calculate the mean of a list of numbers, and then followed by printing a statement 'Hello World!' and finally followed by printing a…
Leockl
  • 1,906
  • 5
  • 18
  • 51
-1
votes
1 answer

Use Python to determine if PDF was generated by Google Docs

I'd like to use Python to tell if a PDF was created by Google Docs. Is there any sort of metadata I can gather with PyPDF2 to determine this?
Arya
  • 1,382
  • 2
  • 15
  • 36
-1
votes
1 answer

How can I rotate every page in a PDF with Python / PyPDF4?

I scanned a bunch of papers into a pdf but they seem to all be rotated, is there a way to rotate the pages with python? I did see the question in Python - Batch rotate pdf with PyPDF2 but am looking for a more generic solution.
Vivek Gani
  • 1,283
  • 14
  • 28
-1
votes
1 answer

The extractText() fucntion does not return text

pdfFileObject = open('MDD.pdf', 'rb') pdfReader = PyPDF2.PdfFileReader(pdfFileObject) count = pdfReader.numPages for i in range(count): page = pdfReader.getPage(i) print(page.extractText() Above is my code and when i run the script it just…
danited
  • 39
  • 1
  • 2
  • 5
-1
votes
2 answers

Unable to import PyPDF2 after installing

I have installed PyPDF2 via pip3 install PyPDF2. The installation was successful. I am trying to import into Python unsuccessfully, and I do not know what is going on! I am using Python 3.7 After entering: from PyPDF2 import PdfFileReader The…
chickenwings123
  • 123
  • 1
  • 8