Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

  • extracting document information (title, author, ...),
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions
6
votes
1 answer

reading/writing xmp metadatas on pdf files through pypdf

I can read xmp metadatas through pypdf with this code: from pypdf import PdfReader a = PdfReader("file.pdf") b = a.xmp_metadata c = b.pdf_keywords But is this the best way? And if I don't use the pdf_keywords property? Is there any way to set…
JuanDeLosMuertos
  • 4,532
  • 15
  • 55
  • 87
6
votes
3 answers

PyPDF2 split pdf by pages

I wanna split pdf file using PyPDF2. All examples in net is too difficult or don't work or always give error "AttributeError: 'PdfFileWriter' object has no attribute 'stream'" Can someone help with it ? Need separete one pdf with 3 pages into three…
Acamori
  • 327
  • 1
  • 5
  • 15
6
votes
7 answers

Reading pdf files line by line using python

I used the following code to read the pdf file, but it does not read it. What could possibly be the reason? from PyPDF2 import PdfFileReader reader = PdfFileReader("example.pdf") contents =…
Rahul Pipalia
  • 71
  • 1
  • 2
  • 4
6
votes
3 answers

PdfFileReader: PdfReadError: Could not find xref table at specified location

I am trying to read Pdf file in python through: from PyPDF2 import PdfFileReader, PdfFileWriter test_reader = PdfFileReader(file("test.pdf", "rb")) Above Line throws error: PyPDF2.utils.PdfReadError: Could not find xref table at specified…
Nitin Bhojwani
  • 702
  • 1
  • 5
  • 14
6
votes
2 answers

how to iterate over all the objects in a PDF page and check which ones are text objects?

I want to iterate over all the objects in a page of a pdf using pypdf. I also want to check that what is the type of the object, whether it is text or graphics. A code snippet would be a great help. Thanks a lot
Shan
  • 18,563
  • 39
  • 97
  • 132
6
votes
1 answer

pyPdf error invalid argument

I'm actually using pyPdf to open, read and write the content of a PDF file. for that I use these lines of code : from pyPdf import PdfFileWriter, PdfFileReader pdf = PdfFileReader(file("/myPdfFile.pdf", "w+b")) content =…
kschaeffler
  • 4,083
  • 7
  • 33
  • 41
5
votes
1 answer

PyPDF Merge and Write issue

I am getting an unexpected error when using this. The first section is from a script that I found online, and I am trying to use it to pull a particular section identified in the PDF's outline. Everything works fine, except right at…
user971847
  • 51
  • 1
  • 2
5
votes
2 answers

pyPDF merging and displaying as httpresponse through django

I'm having trouble incorporating pyPDF logic to merge two pdf files into my django site. I have written code that works to merge files when run in a python file on the local server(but I need to explicitly identify which files to merge: from pyPdf…
Joseph
  • 290
  • 5
  • 15
5
votes
0 answers

Python: Reading PDF with PyPDF2 results in Superfluous whitespace error

I've been struggling with reading a text from a PDF in Python. What I need is PyPDF2 to find a given string and return a reference number placed next to that string. That's the code I'm trying: import os import shutil import PyPDF2 from PyPDF2…
darkspeed
  • 51
  • 3
5
votes
1 answer

error: Unable to find trailer dictionary while recovering damaged file

PyPDF2 fail sometimes with decryption of some PDF files, and I am trying do decrypt them with pikepdf but I am getting this error: Unable to find trailer dictionary while recovering damaged file Any ideas?
Yordan
  • 113
  • 2
  • 7
5
votes
1 answer

Change font type/size using PDF annotations

I'm writing data to a PDF with named fields and then changing the attributes of those fields to make them readonly. This is great, but I'd like to be able to manipulate the text as well, change the font size, maybe even the font itself. According to…
markwalker_
  • 12,078
  • 7
  • 62
  • 99
5
votes
5 answers

How to merge two landscape pdf pages using pyPdf

I'm having trouble merging two PDF files with pyPdf. When I run the following code the the watermark (page1) looks fine, but the page2 has been rotated 90 degrees clockwise. Any ideas what's going on? from pyPdf import PdfFileWriter,…
Humphrey
  • 4,108
  • 2
  • 28
  • 27
5
votes
1 answer

Correcting PDF pages with wrong orientation information with PyPDF2

I'm trying to merge a number of PDF documents in one. However, the documents have different sources, some of them being created in the computer, some of them scanned with different scanners / softwares. I'm scaling them all to A4 size before joining…
5
votes
2 answers

How to stitch two pdf pages into one in python

I am using python, and I want to combine two PDF pages into a single page. My purpose is to combine these two pages into one, not two PDFs. Is there any way to combine the two PDFs one by one? I don't want to merge these two. Without overlapping,…
tins johny
  • 195
  • 1
  • 13
5
votes
1 answer

convert from pdf to text: lines and words are broken

I want to convert a pdf file to text by PyPDF2 but converted text looks differents from PDF file. Specifically, one line in PDF is broken into multiple lines in text and words may be broken as well. Attached is the PDF and the text file I got with…
hongftu
  • 65
  • 1
  • 6