Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

  • extracting document information (title, author, ...),
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions
0
votes
1 answer

Python formatWarning and cross-package errors

Okay, I am confused. I am using two Python packages - PyPDF2 and SQLAlchemy. SQLAlchemy is raising a warning using python's warning.warn(), and somehow calling a formatWarning() function in PyPDF2, which also uses python's warning.warn(). Is this…
Adam Morris
  • 8,265
  • 12
  • 45
  • 68
0
votes
1 answer

Read mail attachment & that use it for pyPdf2: PdfFileReader

I'm just trying to read mail attachment (PDF file attached) in memory so that i can use that it for pyPdf2 Here is the basic code, from pyPdf2 import PdfFileReader . . for attachment in message.attachments: attachment_name =…
Niks Jain
  • 1,617
  • 5
  • 27
  • 53
0
votes
1 answer

PyPdf python encode method

text = getPDFContent(path).encode("ascii", "ignore") This is my actual code. Can anyone tell me what does ignore do? And if there's an another parameter for don't copy non-ascii chars? (I copied the function. It is used to get the contents of a…
diegodalbosco
  • 63
  • 1
  • 6
0
votes
1 answer

Can PyPdf2 recognize a wildcard

I am creating a python script that uses PyPdf2. I am trying to open and append a file using a wild card in the file name. It is taking the * literally in the file name. Is there a way to declare wildcards with the open and merge functionality in…
user12059
  • 733
  • 2
  • 13
  • 27
0
votes
3 answers

Are PDF box coordinates relative or absolute?

I want to programmatically edit a PDF using pyPDF. Currently, I'm struggling with interpreting the various PDF boxes' (TrimBox, MediaBox etc.) dimensions. Each box has four dimensions stored as a four-tuple, e.g.: TrimBox: 56.69 56.69 …
Daniel Werner
  • 1,350
  • 16
  • 26
0
votes
1 answer

Aligning two PDFs for a merge using Cairo and pyPDF

I need to programmatically add additional graphical elements onto an existing, static PDF book cover. Right now I use pycairo to draw onto a transparent PDFSurface, then merge it into the existing static PDF using pyPdf. This way, the PDFSurface…
Daniel Werner
  • 1,350
  • 16
  • 26
0
votes
0 answers

int() got an unexpected keyword argument 'base' error at line 803 in pdf.py when using PyPDF2

When I execute the following code in Visual Studio using Python tools and ironpython 2.7 and PyPDF2 v1.20. i got this error "int() got an unexpected keyword argument 'base' " line 803 in pdf.py This is my complete code: import clr…
eureka
  • 67
  • 9
0
votes
1 answer

How to add more tolerance for whitespaces in PyPDF2?

I'm looking for the easiest way to convert PDF to plain text in Python. PyPDF2 seemed to be very easy, here is what I have: def test_pdf(filename): import PyPDF2 pdf = PyPDF2.PdfFileReader(open(filename, "rb")) for page in pdf.pages: print…
kadrian
  • 4,761
  • 8
  • 39
  • 61
0
votes
3 answers

newline in text extraction from pdf

I am coding a function about extracting text in pdf, I am also using the pyPdf library. Extracting was okay. But I am encountering a couple of problems like it excluding the newline. So I find a way to add a newline, so I have done this: # Iterate…
Bazinga
  • 2,456
  • 33
  • 76
0
votes
1 answer

How to extract text from PDF uploaded in Google App Engine using PyPDF2?

Is there any way to extract text and documentInfo from PDF file uploaded via Google app engine? I want to use PyPDF2, and my code is this: pdf_file = self.request.POST['file'].file pdf_reader = pypdf.PdfFileReader(pdf_file) This gives me…
funkifunki
  • 1,149
  • 2
  • 13
  • 24
0
votes
0 answers

how do I call up a pdf file on a specific bookmark in python?

I want to find a file in a folder, find a specific bookmark in that file, and when the right information is entered (for example "file 7, bookmark 5") the file will open to the right spot. (in python) I've found this: path_to_pdf =…
user3084455
  • 706
  • 6
  • 9
0
votes
2 answers

PYPDF watermarking returns error

hi im trying to watermark a pdf fileusing pypdf2 though i get this error i cant figure out what goes wrong. i get the following error: Traceback (most recent call last): File "test.py", line 13, in page.mergePage(watermark.getPage(0))…
user1949157
  • 37
  • 1
  • 5
0
votes
1 answer

PyPDF2 mergeTranslatedPage didn't merge pages in right way

i try to use PyPDF2 to merge 2 pdf pages into one. Here pdf example files http://ge.tt/9IvaIo01 But when i try to merge, i recive copy of each page from top and bottom. Here sample which demonstrate when use mergeTranslatedPage on page 0 and page 1…
Darius
  • 180
  • 1
  • 13
0
votes
3 answers

renaming a list of pdf files with for loop

i am trying to rename a list of pdf files by extracting the name from the file using PyPdf. i tried to use a for loop to rename the files but i always get an error with code 32 saying that the file is being used by another process. I am using…
chidimo
  • 2,684
  • 3
  • 32
  • 47
0
votes
1 answer

Organizing PDFs in pyPDF

I have a question regarding Python and pyPdf. What I am attempting to do, is create a PDF(obviously) and then have it ordered in a certain way. So that every time I run my script, it sorts it in a certain way for me, regardless of when the files…