Questions tagged [pymupdf]

PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.

PyMuPDF is a Python binding for – “a lightweight PDF and XPS viewer”.

can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats.

These are files with extensions .pdf, .xps, .oxps, .cbz, .fb2 or .epub (so you can develop e-book viewers in Python).

PyMuPDF provides access to many important functions of MuPDF from within a Python environment.

Note on the Name fitz:

The standard Python import statement for this library is import fitz. This has a historical reason.

257 questions
2
votes
1 answer

Extracting text in known bbox from pdf, PDFQuery too slow

I've found the bbox coordinates in the lxml file and managed to extract the wanted data with PDFQuery. Then I write the data to a csv file. def pdf_scrape(pdf): """ Extract each relevant information individually input: pdf to be scraped …
NOVEREI
  • 23
  • 5
2
votes
1 answer

Use PyMuPDF to bold parts of text

I am trying to use PyMuPDF to bold portions of each word in a PDF file. So, for example, a file with the string "There are many pies" will result in "There are many pies" I have seen that you can use Page.get_textpage().extractWORDS() to sort of…
JetSetTime
  • 25
  • 3
2
votes
2 answers

Delete text from pdf using PyMUPDF

I need to remove the text "DRAFT" from a pdf document using Python. I can find the text box containing the text but can't find an example of how to edit the pdf text element using pymupdf. In the example below the draft object contains the coords…
user3005422
  • 41
  • 2
  • 5
2
votes
1 answer

Save PDF file as images with same quality as original PDF

I want to save each page of a pdf file as a single image file: import fitz doc = fitz.open('file.pdf') for i in range(doc.page_count): page = doc[i] pix = page.get_pixmap() pix.save(f'page-{i}.png') pix.pil_save(f'page-{i}.jpg',…
beenieman
  • 39
  • 1
  • 6
2
votes
1 answer

Page orientation in PyMuPDF

I am trying to extract text from some Pdfs. For this purpose I am using PyMuPDF library (1.19.2) in Python. I am however having some trouble understanding the orientation of pages and images in the Pdfs. When I look at the PDF in Adobe reader, the…
2
votes
1 answer

How do I resolve "No module named 'frontend'" error message on Google Cloud Function

I'm trying to deploy a cloud function with Python 3.9 but when I run gcloud functions deploy my_function --project my_project --runtime python39 --trigger-resource bucket_name --trigger-event google.storage.object.finalize the deploy fails with…
Patrick
  • 33
  • 1
  • 5
2
votes
1 answer

Add Bookmarks to pdf using Pymupdf

How to add Bookmarks to pdf using Pymupdf. I have seen many ways using PyPDF2 but since I'm already using pymupdf for other annotations I would prefer pymupdf for adding bookmarks. Also would like to highlight the text and add bookmarks to it.
Nayana Madhu
  • 1,185
  • 5
  • 17
  • 34
2
votes
0 answers

How to replace text in hidden text layer of pdf?

I have to remove sensitive information from pdf. I want to do this in both the image layer and the text layer. I managed to get half the target result using the fitz library. This is the code I use, in a simplified form. phrase_to_redact =…
nietoperz21
  • 303
  • 3
  • 12
2
votes
2 answers

PyMuPDF Pixmap tobytes() returns attribute error

I'm following the documentation and using the latest PyMuPDF (1.18.13). However Pixmap.tobytes() isn't working for me: zoom = 2 # zoom factor mat = fitz.Matrix(zoom, zoom) pix = page.getPixmap(matrix = mat) stream =…
koopmac
  • 936
  • 10
  • 27
2
votes
2 answers

fitz.open() not working when in a for loop (FITZ,PYTHON,PYMUPDF)

I am getting stuck when trying to iterate through files in a directory ('PDFS') with fitz from PyMuPDF. The thing is, the code works when I am just doing document = "somepdf.pdf", but as soon as I insert a for loop and try to access files that way…
2
votes
3 answers

mupdf: Can't do incremental writes when changing encryption

I'm trying to add table of contents to the pdf using fitz package. Here's my script doc = fitz.open(path) bookmarks = [[1, 'INTRODUCTION', 1], [1, 'MANUSCRIPT COMPONENTS', 1], [1, 'MULTIMEDIA FIGURES – VIDEO AND AUDIO FILES', 2], [1, 'MATHEMATICAL…
Venkatesh Dharavath
  • 500
  • 1
  • 5
  • 18
2
votes
1 answer

How to reduce size of modified PDF using pymupdf

I'm editing a pdf by redacting certain words and adding different words on top of the redacted area in pymupdf. The code works successfully however it makes a very large single page pdf (9MB). I assume this is because of drawing many shapes and…
koopmac
  • 936
  • 10
  • 27
2
votes
0 answers

Python Text annotation with PyMuPDF

I'm using PyMuPDF for annotating some text in . pdf document by using: import fitz import re def data_(text): annotation_text = r"(amet)" for line in text: if re.search(annotation_text, line, re.IGNORECASE): search =…
user3043636
  • 559
  • 6
  • 23
2
votes
1 answer

Python Image extraction sequence from pdf

I was trying to extract images from a pdf using PyMuPDF (fitz). My pdf has multiple images in a single page. I am maintaining a proper sequence number while saving my images. I saw that the images being extracted don't follow a proper sequence.…
Sabster
  • 89
  • 1
  • 12
2
votes
1 answer

How to split a PDF with PyMuPDF (with a loop)?

I'd like to use PyMuPDF : I'd like to split a pdf, with for each splitted file, a file named with the name of the bookmark, with only page I've succefully my files, for exemple 4 PDF files for a 4 pages PDF source.... but in the several pdf, I don't…
1 2
3
17 18