Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

  • extracting document information (title, author, ...),
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions
0
votes
0 answers

No module named 'pyPDF2'

I am using Python 3.5, and it would appear that I've successfully installed PyPDF2. But for some reason, I get an ImportError. Would appreciate any help given.
Riley Hun
  • 2,541
  • 5
  • 31
  • 77
0
votes
0 answers

splitting PDF files in 50 pages interval

I have a Ghostscript to split PDF books in 50 pages interval. The problem is the GS is removing the transparency (I think this is called alpha channel in technical terms: http://www.peteryu.ca/tutorials/publishing/pdf_manipulation_tips) of the…
Dellu
  • 139
  • 1
  • 11
0
votes
1 answer

python pdf (PyPDF2 module) - How to split/merge this?

I was trying to split & merge pdf files so that i can remove the first page of each pdf files.. Here's the code. #python3 #split and merge pdf files! import os, PyPDF2 pdfFiles = [] …
Hashnut
  • 367
  • 3
  • 18
0
votes
1 answer

how to create pdf in django using pypdf

this is my views def pdf_datakar(request): from fpdf import FPDF pdf = FPDF(format='letter') pdf.add_page() pdf.set_font("Arial", size=12) pdf.cell(200, 10, txt="Welcome to Python!", align="C") …
Gusan
  • 411
  • 3
  • 5
  • 19
0
votes
1 answer

How to get pyPdf to work with os or glob

My goal is to read a directory with several PDF files and return the number of pages in each file using Python. I'm trying to use the pyPdf library but it fails. If I do this: from pyPdf import PdfFileReader testFile = "C:\\path\\file.pdf" pdfFile…
Tensigh
  • 1,030
  • 5
  • 22
  • 44
0
votes
1 answer

find the pypdfocr config.yaml file

Where can I locate the config.yaml file for pypdfocr? In the pypdfocr release info, it mentions a config file I can use to specify where the OCR'ed documents are filed. For example: pypdfocr filename.pdf -f -c config.yaml where the config.yaml file…
Neue1987
  • 171
  • 1
  • 2
  • 13
0
votes
1 answer

PyPDF module doesn' t make valid pdf file

i'm trying to make some program in python to manipulate my pdf beamer presentations. Professor use on click dynamic transition so one page has several click transitions. I want to print those presentations but i have around 5000 pages. So i want to…
0
votes
1 answer

Unicode error PyPdf

I try to download several pdfs using the requests library and merge them together using pypdf. In general, this is working fine but for some pdfs I just get an error. MWE.py import requests from pyPdf import PdfFileWriter, PdfFileReader from…
Rambo Ramon
  • 1,034
  • 7
  • 10
0
votes
1 answer

python-pypdf split pdf based a list of page ranges

I am trying to split a large pdf based on the list of names and list of pages. For example first name has three pages, second has one page, the third has five pages and so on. I created the following script and it is not working correctly. For…
dvruiz
  • 3
  • 2
0
votes
2 answers

PyPDF2 append a PDF from the 2nd page

I'm learning how to program using the "automate the boring stuff"-book, I have, however stumbled upon a roadblock in chapter 13. "Merge multiple PDF's, but omit the title page from all but the first page" In the book, they do it by looping over the…
Sybie
  • 71
  • 1
  • 1
  • 12
0
votes
1 answer

Problems with PyPDF ignoring some data

Hoping for some help, as I can't find a solution. We currently have a lot of manual data inputs through people reading PDF files, and I have been asked to find a way to cut this time down. My solution would be to transform the PDF to a much easier…
Czakky
  • 38
  • 1
  • 5
0
votes
1 answer

pyPDF IOError Exception in OSX

I am trying to open a pdf (named kalimera.pdf) using PdfFileReader from the pyPdf module, using the following set of commands from pyPdf import PdfFileReader, PdfFileWriter document = PdfFileReader(open('kalimera.pdf', 'rb')) I get the following…
bergercookie
  • 2,542
  • 1
  • 30
  • 38
0
votes
1 answer

Reading PDF using PyPDF2 not resulting anything

Here is my code - courtesy - http://code.activestate.com/recipes/511465-pure-python-pdf-to-text-converter/ . I modified it to include next version of PyPDF. import PyPDF2 def getPDFContent(path): content = "" # Load PDF into pyPDF pdf =…
Guru
  • 148
  • 1
  • 12
0
votes
1 answer

Windows Error: 32 when trying to rename file in python

I'm trying to rename some PDF files using pyPdf and my code it seems to work fine until it reaches the rename sentence. The While/if block of code looks for the page number where string "This string" is located and when found stops. Having the page…
Ger Cas
  • 2,188
  • 2
  • 18
  • 45
0
votes
1 answer

Merging PDF's with PyPDF2 with inputs based on file iterator

I have two folders with PDF's of identical file names. I want to iterate through the first folder, get the first 3 characters of the filename, make that the 'current' page name, then use that value to grab the 2 corresponding PDF's from both…
DPSSpatial
  • 767
  • 3
  • 11
  • 31