Questions tagged [pypdf]

pypdf is a pure-python PDF library capable of splitting, merging together, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. It can retrieve text and metadata from PDFs as well as merge entire files together.

A Pure-Python library built as a PDF toolkit. It is capable of:

  • extracting document information (title, author, ...),
  • splitting documents page by page,
  • merging documents page by page,
  • cropping pages,
  • merging multiple pages into a single page,
  • encrypting and decrypting PDF files.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

pypdf was inactive from 2010 to 2022. It got maintained in December 2022 again.

Relationship to PyPDF2

PyPDF2 was a fork of pyPdf.

PyPDF2 received a lot of updates in 2022, but PyPDF2 was deprecated in favor of pypdf.

pypdf==3.1.0 is essentially the same as PyPDF2==3.0.0. Just the package name was changed to pypdf.

See: https://pypdf.readthedocs.io/en/latest/meta/history.html

Links

1451 questions
0
votes
1 answer

pypdf for lists of pdfs

I have got pypdf to work just fine for a single pdf file, but I can not seem to get it to work for a lits of files, or in a for loop for multiple pdfs, without failing because of the string not being callable. Any ideas I can use as a work…
0
votes
1 answer

Detect position of watermark in a pdf

I am on ubuntu. I have a pdf file with pages divided into a grid. Each block of the grid contains name/age/dob/photo of a candidate. some records have a watermark "disqualified" I need to scrape his pdf, with disqualified candidates in a separate…
sulabh
  • 1,097
  • 1
  • 8
  • 22
0
votes
2 answers

How to obtain a file object from a variable or from http URL without actually creating a file?

I want to manipulate a downloaded PDF using PyPDF and for that, I need a file object. I use GAE to host my Python app, so I cannot actually write the file to disk. Is there any way to obtain the file object from URL or from a variable that contains…
Alex Bausk
  • 690
  • 5
  • 29
0
votes
1 answer

pyPdf: Speeding up the write / combine operation?

I've got a pyPdf application combining a bunch of PDFs into one PDF and properly building a table of contents using external metadata. It works really well for some PDFs, but for others, it just seems to hang and never actually write the PDFs. I…
SquidneyPoitier
  • 413
  • 1
  • 4
  • 6
0
votes
1 answer

Generating pdf using web2py-appreport (xhtmltopdf) in Python Web2py webapp

I am from a non coding background so python, web2py is very new to me. My app needs to export textarea content (using RTE redactor) to pdf. I get html content from textarea (redactor), can you please advice me on how to use pyfpdf to generate a pdf…
Akash
  • 19
  • 3
0
votes
4 answers

Combine two lists of PDFs one to one using Python

I have created a series of PDF documents (maps) using data driven pages in ESRI ArcMap 10. There is a page 1 and page 2 for each map generated from separate *.mxd. So I have one list of PDF documents containing page 1 for each map and one list of…
-1
votes
0 answers

PyPDF2 decrypt issue

I am trying to open PDF's that have an opening password and when I try to open them it tells me that the password is incorrect, however, when I validate manually the password is correct, could you please help me? This is my code: def…
-1
votes
0 answers

Programm helping to count the nummbers of the words in a pdf file

iam trying a new thing and need your experiance. I want to write a programm helps me to count how many word in a pdf file and show the frequency of each word. Can i do such as this programm with pypdf. Thank you in advence.
Sss Ddd
  • 1
  • 1
-1
votes
1 answer

How to Replace text in PDF using Python

I am trying to replace some text in python and have the following code which is updating the content of the PDF correctly till the time it is in memory, but overwrites it with original content on writing to a file : def replace_text(content,…
user2586942
  • 73
  • 1
  • 10
-1
votes
1 answer

Insert a tabular data in x,y position in a pdf using python

I need to insert tabular data into an existing pdf at a particular position(an x,y position). How can I implement it? Which Python pdf library is best for the use case? Table format data should be formattable like bold, coloring, etc.
akhil viswam
  • 496
  • 9
  • 24
-1
votes
1 answer

While storing pdf text in csv how to avoid spreading text to multiple row

I am storing pdf text (extracted with pypdf) in a CSV file. the problem is few pdf file is very long and the text spreads into multiple rows for those long pdf file instead of keeping a single row. How to keep them in a single row? here my output…
boyenec
  • 1,405
  • 5
  • 29
-1
votes
1 answer

Extracting PDF Version using Python

I attempted to follow Getting PDF Version using Python to extract the version from a PDF file and unfortunately resulted in an error code. I'm new to Python and have no idea how to fix this. I can view the PDF file in something like Notepad and see…
Neil
  • 3
  • 2
-1
votes
1 answer

Extracting images from a PDF using PyPDF2 - but the pdf has no metadata

The PDF is a scanned image, so there is no way I have found yet, to pull out the images. I have tried methods including crop and media boxes, but it pulls the entire pages as images. I have also tried other parsing libraries like pdfminer.six, but…
-1
votes
1 answer

How to iterate through pdf files and find the occurrences of a same list of specific words in each file?

I need help to find a list of specific words in many pdf files using python. For example, I want to find the occurrences of words "design" and "process" in two pdf files. The following is my code: output = [] count = 0 for fp in os.listdir(path): …
Fiona S
  • 3
  • 3
-1
votes
1 answer

E: Unable to locate package pyPdf

I'm trying to install pyPdf by uising this command( sudo apt-get install pyPdf) But i'm getting the following..... Reading package lists... Done Building dependency tree Reading state information... Done E: Unable to locate package pyPdf