Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

  • extracting document information (title, author, ...),
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions
0
votes
1 answer

PyPDF2: TypeError: coercing to Unicode: need string or buffer, PdfFileWriter found

Reworking code to include Context Managers via with statements. However I am receiving a Traceback: using Python 2.7 on Windows Traceback (most recent call last): File "CommissionSecurity.py", line 52, in with open(output, 'w') as…
AlliDeacon
  • 1,365
  • 3
  • 21
  • 35
0
votes
2 answers

Error in the coding of the characters in reading a PDF

I need to read this PDF. I am using the following code: from PyPDF2 import PdfFileReader f = open('myfile.pdf', 'rb') reader = PdfFileReader(f) content = reader.getPage(0).extractText() f.close() content = ' '.join(content.replace('\xa0', '…
macabeus
  • 4,156
  • 5
  • 37
  • 66
0
votes
1 answer

pyPdf extracting info from IndirectObject

I am writing a script that will read the creation and modified dates of pdf files. I am using pyPdf package in Python I have the following code from pyPdf import PdfFileWriter, PdfFileReader input1 =…
user4505419
  • 331
  • 1
  • 4
  • 12
0
votes
1 answer

Not getting the text from PDF in right format when reading using pyPDF

I was trying to read the PDF document on the following link using the pyPDF package in Python. http://www.hdfcsec.com/Share-Market-Research/Research-Details/StockReports/3011454 I used the following code to read the PDF: ###########Beginning of…
0
votes
1 answer

How to reset the output file?

I want to split a long PDF document into many parts, e.g. part 1 comprising pages 3-14, part 2 comprising pages 15-19, part 3 comprising pages 20-27, using PyPDF2. I coded a loop that takes the relevant pages out of the original PDF and saves them…
sh_python
  • 17
  • 2
0
votes
1 answer

Start first PDF page a certain distance from top then start at the top on every page after that?

Using XHTMLPDF2 in Python; great tool! I'm generating PDFs to integrate into yet another PDF, so I need the first page to start at a certain height from the top (say 432pt at times, 200pt at others; it's in a variable). Every page after that,…
Miguel Diaz
  • 411
  • 1
  • 4
  • 8
0
votes
4 answers

Python - convert pdf to text, encoding error

I tried to convert pdf document to txt file. (example of pdf file link) So I tried like below. But the extracted text is strange like ??챘#?遏?h첨챦_철?‾n?~w??¬?k How can I fix it? #!/usr/bin/python # -*- coding: cp949 -*- # -*- coding: utf-8 -*- # -*-…
user3704652
  • 303
  • 4
  • 6
  • 16
0
votes
1 answer

Using PyPDF2 to merge files into multiple output files

Here is the code block that is causing the issue. The loop will append the new file each time, which is not what I am trying to accomplish. For example, outputfile1 is input1.pdf, outputfile2 is input1.pdf + input2.pdf... I am trying to merge…
user3482598
  • 1
  • 1
  • 3
0
votes
1 answer

Converting a PDF file consisting of tables into text document containings tables in Python

I have this pdf file that consists of general tables consisting of names,address,phone number,fax number. I want is : 1) read this file and get the content of each row and put it in data base. i.e get the name from corresponding name column of…
0
votes
3 answers

How to split/crop a pdf along the middle using pyPdf

I have a pdf that looks like this and i'd like to crop all the text out, almost right down the middle of the page. I found this script that does something simmilar: def splitHorizontal(): from pyPdf import PdfFileWriter, PdfFileReader input1 =…
Jeff
  • 21
  • 1
  • 3
0
votes
1 answer

pyPDF2 merging error coercing to Unicode

I'm trying to give pypdf some pdfs to merge and it throws a coercing to Unicode error. My code is from PyPDF2 import PdfFileMerger, PdfFileReader import pdfcrowd from django.http import HttpResponse def generate_pdf(request): list_of_pages =…
user3982654
0
votes
0 answers

crack pdf using python scrypt

i have to write a scrypt (for a university class and i must send to proffesor until sunday) that will crack a pdf file, i have tryied a lot so far but cant make it work my code is: #!/usr/bin/python2.7 from PyPDF2 import PdfFileReader,…
dimargy
  • 1
  • 1
0
votes
1 answer

How to use global variables in tkinter and PyPDF2 to merge PDF files

Been using Python for a very short amount of time and can't figure out what is wrong with this code. I can't find any examples that would work for my code, so I'm asking here. import sys import os from PyPDF2 import PdfFileReader, PdfFileMerger,…
user1113569
  • 3,441
  • 2
  • 14
  • 10
0
votes
1 answer

PyPDF hangs on big drawing

Here is the PDF I'm trying to parse. My code (below) hangs on the following line: content += " ".join(extract.strip().split()) It hangs on page 21, which is a big drawing. I wouldn't mind just skipping pages like this big drawing, but I'm not…
Sid Kwakkel
  • 749
  • 3
  • 11
  • 31
0
votes
2 answers

(PyPDF2) Attempt to merge PDFs produces error

I've been trying to add a watermark as shown in Add text to Existing PDF using Python, but I keep getting error regarding the pdf data from reportlab. Is it a problem with the input pdf? Setup: Python 3.3 (Anaconda Distribution), Windows 7 from…
Ben Southgate
  • 3,388
  • 3
  • 22
  • 31