Questions tagged [pymupdf]

PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.

PyMuPDF is a Python binding for mupdf – “a lightweight PDF and XPS viewer”.

mupdf can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats.

These are files with extensions .pdf, .xps, .oxps, .cbz, .fb2 or .epub (so you can develop e-book viewers in Python).

PyMuPDF provides access to many important functions of MuPDF from within a Python environment.

Note on the Name fitz:

The standard Python import statement for this library is import fitz. This has a historical reason.

257 questions

votes

1 answer

Extracting text in known bbox from pdf, PDFQuery too slow

I've found the bbox coordinates in the lxml file and managed to extract the wanted data with PDFQuery. Then I write the data to a csv file. def pdf_scrape(pdf): """ Extract each relevant information individually input: pdf to be scraped …

python pdf pdfminer pymupdf

asked Jun 07 '22 at 12:44

NOVEREI

votes

1 answer

Use PyMuPDF to bold parts of text

I am trying to use PyMuPDF to bold portions of each word in a PDF file. So, for example, a file with the string "There are many pies" will result in "There are many pies" I have seen that you can use Page.get_textpage().extractWORDS() to sort of…

python pdf highlight pymupdf

asked May 23 '22 at 23:20

JetSetTime

votes

2 answers

Delete text from pdf using PyMUPDF

I need to remove the text "DRAFT" from a pdf document using Python. I can find the text box containing the text but can't find an example of how to edit the pdf text element using pymupdf. In the example below the draft object contains the coords…

python pymupdf

asked Apr 27 '22 at 18:28

user3005422

votes

1 answer

Save PDF file as images with same quality as original PDF

I want to save each page of a pdf file as a single image file: import fitz doc = fitz.open('file.pdf') for i in range(doc.page_count): page = doc[i] pix = page.get_pixmap() pix.save(f'page-{i}.png') pix.pil_save(f'page-{i}.jpg',…

python pymupdf

asked Feb 10 '22 at 14:24

beenieman

votes

1 answer

Page orientation in PyMuPDF

I am trying to extract text from some Pdfs. For this purpose I am using PyMuPDF library (1.19.2) in Python. I am however having some trouble understanding the orientation of pages and images in the Pdfs. When I look at the PDF in Adobe reader, the…

python pdf python-imaging-library pymupdf

asked Jan 12 '22 at 04:21

Deepak Dalakoti

votes

1 answer

How do I resolve "No module named 'frontend'" error message on Google Cloud Function

I'm trying to deploy a cloud function with Python 3.9 but when I run gcloud functions deploy my_function --project my_project --runtime python39 --trigger-resource bucket_name --trigger-event google.storage.object.finalize the deploy fails with…

python python-3.x pip google-cloud-functions pymupdf

asked Dec 09 '21 at 17:35

Patrick

votes

1 answer

Add Bookmarks to pdf using Pymupdf

How to add Bookmarks to pdf using Pymupdf. I have seen many ways using PyPDF2 but since I'm already using pymupdf for other annotations I would prefer pymupdf for adding bookmarks. Also would like to highlight the text and add bookmarks to it.

python-3.x pdf pymupdf

asked Nov 16 '21 at 07:06

Nayana Madhu

1,185
5
17
34

votes

0 answers

How to replace text in hidden text layer of pdf?

I have to remove sensitive information from pdf. I want to do this in both the image layer and the text layer. I managed to get half the target result using the fitz library. This is the code I use, in a simplified form. phrase_to_redact =…

python pdf pymupdf redaction

asked Sep 13 '21 at 12:11

nietoperz21

votes

2 answers

PyMuPDF Pixmap tobytes() returns attribute error

I'm following the documentation and using the latest PyMuPDF (1.18.13). However Pixmap.tobytes() isn't working for me: zoom = 2 # zoom factor mat = fitz.Matrix(zoom, zoom) pix = page.getPixmap(matrix = mat) stream =…

python pymupdf

asked May 18 '21 at 23:39

koopmac

votes

2 answers

fitz.open() not working when in a for loop (FITZ,PYTHON,PYMUPDF)

I am getting stuck when trying to iterate through files in a directory ('PDFS') with fitz from PyMuPDF. The thing is, the code works when I am just doing document = "somepdf.pdf", but as soon as I insert a for loop and try to access files that way…

python pdf pymupdf

asked Apr 15 '21 at 16:53

Leonardo Acioli Arruda Sampaio

votes

3 answers

mupdf: Can't do incremental writes when changing encryption

I'm trying to add table of contents to the pdf using fitz package. Here's my script doc = fitz.open(path) bookmarks = [[1, 'INTRODUCTION', 1], [1, 'MANUSCRIPT COMPONENTS', 1], [1, 'MULTIMEDIA FIGURES – VIDEO AND AUDIO FILES', 2], [1, 'MATHEMATICAL…

python pymupdf

asked Mar 15 '21 at 16:47

Venkatesh Dharavath

votes

1 answer

How to reduce size of modified PDF using pymupdf

I'm editing a pdf by redacting certain words and adding different words on top of the redacted area in pymupdf. The code works successfully however it makes a very large single page pdf (9MB). I assume this is because of drawing many shapes and…

python pymupdf

asked Feb 25 '21 at 01:41

koopmac

votes

0 answers

Python Text annotation with PyMuPDF

I'm using PyMuPDF for annotating some text in . pdf document by using: import fitz import re def data_(text): annotation_text = r"(amet)" for line in text: if re.search(annotation_text, line, re.IGNORECASE): search =…

python pymupdf

asked Feb 01 '21 at 16:19

user3043636

votes

1 answer

Python Image extraction sequence from pdf

I was trying to extract images from a pdf using PyMuPDF (fitz). My pdf has multiple images in a single page. I am maintaining a proper sequence number while saving my images. I saw that the images being extracted don't follow a proper sequence.…

python pymupdf image-extraction

asked Dec 02 '20 at 19:27

Sabster

votes

1 answer

How to split a PDF with PyMuPDF (with a loop)?

I'd like to use PyMuPDF : I'd like to split a pdf, with for each splitted file, a file named with the name of the bookmark, with only page I've succefully my files, for exemple 4 PDF files for a 4 pages PDF source.... but in the several pdf, I don't…

python pymupdf

asked Nov 01 '20 at 21:27

Abou Ilyès

Prev 1 2

…

17 18 Next