Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

  • extracting document information (title, author, ...),
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions
-3
votes
2 answers

I have pyPDF2 installed on my interpreter that has venv and uses python 3.6.6 but i'm not able to import it. What am I doing wrong?

I have pyPDF2 installed on my interpreter that has venv and uses python 3.6.6 but i'm not able to import it. What am I doing wrong? I use pycharm.
Aakash Dusane
  • 388
  • 4
  • 17
-3
votes
2 answers

Text Mining from PDF file using Python

i have annual report of a company(in .pdf format) and i want to fetch balance sheet and other related report form annual report using python. i tried with PyPDF2 lib but it is extracting highly unstructured text. is there any way??
PRAYANK
  • 57
  • 1
  • 9
-3
votes
2 answers

Looping over a list - 'int' object is not subscriptable

I've uploaded a PDF using PyPDF2. Here's my code: PageNo=list(range(0,pfr.numPages)) for i in PageNo: pg = writer.addPage(i) PageNo creates a list of all the page numbers and I'm trying to add each page from the list into a new PDF, but when I…
user2744315
  • 77
  • 1
  • 1
  • 7
-3
votes
1 answer

Extract URLS,BOOKMARKS, MARKUPs and Comments from a pdf using PyPDF2 or Pdfminer

I tried to extract pdf urls,comments or bookmarsk from the pdf using pypdf2 or pdfminer. I cant see /Annots or URI even if there are urls or bookmarsk present in the pdf.
user222213
  • 111
  • 1
  • 2
  • 12
-3
votes
1 answer

Where am I going wrong?

At the moment my code is extracting data out of a PDF & counting the word frequency. I've been trying for a while now to arrange it in order of frequency but haven't been able to. I've looked at multiple similar answers but can't find an answer that…
Trent
  • 1
-3
votes
1 answer

How to open a pdf file in binary format

I want to read the metadata of pdf files so i am using pyPdf package but for some files i am facing error (i.e;PdfFileReader stream/file object is not in binary mode,it may not be read correctly)
praveen JP
  • 9
  • 2
  • 10
-4
votes
0 answers

Extra spaces, extra new line characters and unable to identify the headers, which are bold, while reading the pdf from python

/*Hi Everyone, I have a PDF file which has some bold side heading(visually bold. Not capital letters). The paragraphs in between the headings are considered as the sections. I am searching for a particular word in the PDF. If any section has that…
-4
votes
1 answer

my number cant saved on list on loop python

I want to save my result number on list, but i cant perform that. Why i cant do that? I Have perform append() for make "numbers" in my result_list, but it didnt work. I using a PyPDF2 for processing PDF Files. object =…
-5
votes
2 answers

No module named 'PyPDF2._codecs', even after already installed

I have installed PyPDF2==2.3.0, but I still get the error below when I import PyPDF2. The error message is: ModuleNotFoundError: No module named 'PyPDF2._codecs'
Guy
  • 21
  • 3
1 2 3
96
97