Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

  • extracting document information (title, author, ...),
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions
4
votes
1 answer

Python PyPDF2 merge rotated pages

I'm using python ReportLab canvas to generate overlay document with watermarks to merge it into source pdf document (with PyPDF2). Recently I encountered a problem with a document that contains rotated pages (/Rotate key is present for Page object…
Max Kamenkov
  • 2,478
  • 3
  • 22
  • 19
4
votes
3 answers

Generating & Merging PDF Files in Python

I want to automatically generate booking confirmation PDF files in Python. Most of the content will be static (i.e. logos, booking terms, phone numbers), with a few dynamic bits (dates, costs, etc). From the user side, the simplest way to do this…
Humphrey
  • 4,108
  • 2
  • 28
  • 27
4
votes
3 answers

I can't install pyPDF package No distributions at all found for pyPdf

I try install this package... $ pip search pyPdf PyPDFLite - Simple PDF Writer. pypdfocr - Converts a scanned PDF into an OCR'ed pdf using Tesseract-OCR and Ghostscript pyPdf - PDF…
Lucas Simon
  • 441
  • 1
  • 5
  • 10
4
votes
0 answers

How can extract just the visible text from a PDF, ignoring cropped parts?

I want to extract text from a cropped PDF document. I tried pdfminer, but it gave me also the cropped text. I need only visible area text.
Umesha D
  • 826
  • 1
  • 7
  • 14
4
votes
2 answers

Is it possible to extract a pdf with its white spaces in Python?

I have been attempting to extract a pdf with Python after a tool was created to extract it using java and pdfbox. While the Java implementation was successful for the same pdf, I have been struggling to do the same in python since both pdfminer and…
4
votes
1 answer

python encoding for turkish characters

I have to read pdf books that are turkish stories. I found a library which is called pyPdf. My test function whichis the below doesn't encode correctly. I think, I need to have turkish codec packet. Am i wrong ? if i am wrong how can I solve this…
hinzir
  • 178
  • 9
4
votes
1 answer

pyPDF bad xref character at startxref

I'm using pyPDF for pdf page extraction and merging. My issues isn't completely dependent on pyPDF, since I've ran into the same type of error with pdfSharp in the past on the same pdf file. The problem is that I'm getting an error when trying to…
brandon
  • 1,230
  • 3
  • 13
  • 31
3
votes
1 answer

Assertion Error in Pypdf package in Python

I am using Python 2.4 and PyPdf 1.13 on a Windows platform. I am trying to merge PDF files from a list into one using the following code: import os from pyPdf import PdfFileWriter,…
gaya3
  • 31
  • 2
3
votes
0 answers

Check which pages are encrypted of pdf doc using pypdf

I would like to know which pages are encrypted of a pdf document. I would like to know this, because some pdf documents are merged where one document had encryption and the other not. So this means some pages were not encrypted. Here I'm trying to…
Quinten
  • 35,235
  • 5
  • 20
  • 53
3
votes
1 answer

Langchain pyPDFLoader

I am currently trying to get started working with Langchain. I am working in Anaconda/Spyder IDE: # Imports import os from langchain.llms import OpenAI from langchain.document_loaders import TextLoader from langchain.document_loaders import…
NeilS
  • 65
  • 1
  • 7
3
votes
2 answers

Error: FloatObject (b'0.000000000000-14210855') invalid; use 0.0 instead while using PyPDF2

I am using function to count occurrences of given word in pdf using PyPDF2. While the function is running I get message in terminal: FloatObject (b'0.000000000000-14210855') invalid; use 0.0 instead My code: def count_words(word): print() …
3
votes
1 answer

PyPDF2 gives "TypeError: argument should be integer or None, not 'NullObject'"

With this script I'm going to create one single pdf file that combines the many pdfs in the folder that I gave it in this line of code input_folder_pdf = sys.argv[1]from the terminal and it creates the output folder if not exists . this code…
3
votes
1 answer

PyPdf does not read the pdf text line by line

I was using PyPdf to read text from a pdf file. However pyPDF does not read the text in pdf line by line, Its reading in some haphazard manner. Putting new line somewhere when its not even present in the pdf. import PyPDF2 pdf_path =…
Himanshu Poddar
  • 7,112
  • 10
  • 47
  • 93
3
votes
2 answers

PyPDF2.errors.PdfReadError: PDF starts with '♣▬', but '%PDF-' expected

I have a folder containing a lot of sub-folders, with PDF files inside. It's a real mess to find information in these files, so I'm making a program to parse these folders and files, searching for a keyword in the PDF files, and returning the names…
SejAC
  • 33
  • 1
  • 5
3
votes
5 answers

Odoo15 - ModuleNotFoundError: No module named 'PyPDF2'

I'm new to Odoo. I use pyenv to host odoo and all the dependencies. All under odoo system user, and I was able to start odoo service: ~# systemctl status odoo-15 ● odoo-15.service - Odoo15 Loaded: loaded (/etc/systemd/system/odoo-15.service;…
redpill
  • 87
  • 2
  • 8