Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

  • extracting document information (title, author, ...),
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions
-1
votes
1 answer

Converting PDF document to DataFrame

I have a PDF document with 388 pages and 1 table per page , i am trying to get them converted to excel or multiple dataframes, but having some difficulties, i have tried pypdf2 and tabula libraries but it stops after extracting only one page. The…
Equan Ur Rehman
  • 229
  • 1
  • 2
  • 11
-1
votes
1 answer

How to generate a pdf file using Word Template in Python?

I need to generate the pdf file in Python. I have created a Word Template file (Word XML document) with placeholders in the document. I populate the placeholders dynamically and can create a word document. However, I would like to convert this…
Kiran
  • 8,034
  • 36
  • 110
  • 176
-1
votes
2 answers

I need to extract text from PDF file and make a new .txt file to put in

I need help in a PYTHON script to read PDF file and copy every word on it and put them in a new .txt file (every word must take 1 line) ; and then deleted the repeated words and count them after that and print the count in the last line
-1
votes
1 answer

PyPDF2 returns negative dimension

I use PyPDF2 to get pdf file pages` dimension but it return negative number for some pdfs. Why? Here is an example, starting from second page, the real height is negative. from PyPDF2 import PdfFileReader input_file = PdfFileReader(open('file.pdf',…
Vahagn
  • 11
  • 3
-1
votes
1 answer

Reading in a PDF file and filter content using a regex

I am trying to filter a PDF file using a regex and the output to only be the word the regex is filtering for. Here is my code: # FILTER PDF CONTENT FOR PHI USING REGEX import PyPDF2 import re # creating a pdf file object pdfFileObj =…
James Davinport
  • 303
  • 7
  • 19
-1
votes
1 answer

Python Select PDF File Parts and Merge into One

I've many PDF Files in One Folder. Two of files have same name end with _01 and _03 these number could be random. I want to merge these two files having same name into one PDF with the name the parts have. any Idea? Like story-writing_01.pdf,…
Mazhar Ali
  • 111
  • 2
  • 6
-1
votes
1 answer

Does this PDF contain PostScript?

Using PyPDF2 to read a pdf file with some line drawings, using code like below from PyPDF2 import PdfFileReader with open('temp.pdf','rb') as f: pdf = PdfFileReader(f) for page in pdf.pages: print page['/Contents'].getData() I see…
djvg
  • 11,722
  • 5
  • 72
  • 103
-1
votes
1 answer

Python 3: message error when I try to open a pdf

I'm having issues with code that used to work during weeks. The problem comes from this part of my code: TypeError: ifile = open('0_Inputs/CompaniesList.csv', "r", encoding = 'utf-8') I got the following message: open() got an unexpected keyword…
B-T
  • 31
  • 1
-1
votes
2 answers

Python PyPDF2 seek of closed file Error

I am making a pdf splitter and at first seemed to work fine. But when i tryed to use multiple page regions , i keep getting this error--> ValueError: seek of closed file. If i omit pdf_file.close() the error will stop but all the pdf created will…
-1
votes
1 answer

Python: TypeError: expected str, bytes or os.PathLike object, not PdfFileReader

I have the following code. This is just a starting point. Later on I'd like to replace the static "Hello Word" text with items from a csv file that i read and loop through for every item in the csv. I want the watermark on every page. # importing…
f0rd42
  • 1,429
  • 4
  • 19
  • 30
-1
votes
1 answer

PyQt file browser - how to work with that file?

I am writing my first app in python using pyqt5. My all icons and main scrips are working correctly, I have all modules imported. Now I need to connect everything together. Here I have the biggest problem. When i click on icon it open file browser;…
memo747
  • 77
  • 10
-1
votes
1 answer

Python - PyPDF2 misses large chunk of text. Any alternative on Windows?

I tried to parse a pdf file with the PyPDF2 but I only retrieve about 10% of the text. For the remaining 90%, pyPDF2 brings back only newlines... a bit frustrating. Would you know any alternatives on Python running on Windows? I've heard of…
Shimuno
  • 17
  • 3
-1
votes
1 answer

Why is PyPDF2 and reportlab removing spaces when inserting text?

I am trying to insert formatted text into the last page of my PDF. I am using the PyPDF2 and reportlab libraries to do this. I am using Python 2.7. For some reason the text gets inserted without spaces and on a new line for every character (not for…
Imozeb
  • 3
  • 2
-1
votes
2 answers

Extracting text from a PDF file using Python 2.7 on Windows 7

I have been using PyPDF2 to extract the text included in this PDF file (generated with pdfTeX-1.40.0) using Python 2.7. It works fine but now i have to extract text from same pdf generated with LibreOffice 4.3 and the result is this(not whole): ˜ !…
Budlog
  • 79
  • 10
-1
votes
1 answer

PyPDF2 File reader returns data without spaces

When I try to read a pdf file which has tabled data using the code below there is no space between the two columns or rows. import PyPDF2 pdfFileObj = open('filename.pdf', 'rb',) pdfReader =…
Aditya Rao
  • 11
  • 2
  • 4