Questions tagged [pymupdf]

PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.

PyMuPDF is a Python binding for mupdf – “a lightweight PDF and XPS viewer”.

mupdf can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats.

These are files with extensions .pdf, .xps, .oxps, .cbz, .fb2 or .epub (so you can develop e-book viewers in Python).

PyMuPDF provides access to many important functions of MuPDF from within a Python environment.

Note on the Name fitz:

The standard Python import statement for this library is import fitz. This has a historical reason.

257 questions

vote

1 answer

PyMuPdf Bookmarks

I have a script that combines a bunch of PDFs into a single file, using PyPDF2, all good but on the company network is really slow. I then tried PyMuPdf and it is 100 times faster, but bookmarks and metadata are not copied automatically. Is there an…

python bookmarks pymupdf

asked Jan 17 '23 at 03:46

Saverio Vasapollo

vote

1 answer

List matches of page.search_for() with PyMuPDF

I'm writing a script to highlight text from a list of quotes in a PDF. The quotes are in the list text_list. I use this code to highlight the text in the PDF: import fitz #Load Document doc = fitz.open(filename) #Iterate over pages for page in…

python pymupdf

asked Nov 26 '22 at 09:44

SamVimes

vote

1 answer

How to extract anchor text/ words from every hyperlinks from pdf using python?

I am trying to extract hyperlink present in each page with their anchor text from pdf using PymuPdf library. I am able to extract hyperlinks with their page numbers but couldn't able to extract anchor text/words for every hyperlinks. Can anyone help…

python pypdf pdfminer pymupdf pdf-extraction

asked Oct 03 '22 at 09:21

gagan lohar

vote

0 answers

Maintaining the sequence of the extracted text and images from the PDF while scrapping them in python

I am trying to extract text and images from a pdf using python using the library PyMuPdf. But unfortunately, I can't preserve the sequence of the image. for example, the Image is placed at the start of the page but while extracting it, the image is…

python python-3.x pymupdf pdf-scraping

asked Sep 13 '22 at 06:51

Sourav Singh

vote

1 answer

Extract all Images from PDF with Python, and retain their transparency

I see a number of solutions on the web and here for extracting images from a PDF with PyMuPDF, PyPDF2, and others, but none them successfully retain transparency information, are using deprecated code that no longer works, or the questions have gone…

python pypdf pymupdf

asked Jul 30 '22 at 17:03

Chris Valentine

1,557
1
19
36

vote

1 answer

Highlight numbers in a PDF using Python

I was able to highlight words in a PDF (using the below code). However, I would also like to highlight any number contained in the same PDF. How would you complement such code? import fitz # opening the pdf file my_pdf =…

python pdf numbers highlight pymupdf

asked Jul 29 '22 at 16:22

CelloRibeiro

vote

2 answers

python pymupdf - How to write something into a pdf form field (widget)

I'm using pymupdf and just trying to write some text into an already existing pdf form field (widget). I was able to identify the widget by its xref, and read its contents, but I don't know how to modify its field_value and save it back. I've tried…

python pdf-form pymupdf

asked Jul 25 '22 at 20:54

Max Iskram

vote

1 answer

installing PyMuPDF in python 3.8 alpine

I am trying to install PyMuPDF in the official Python 3.8 alpine docker image. The dockerfile is like this: FROM python:3.8-alpine RUN apk add --update --no-cache \ gcc g++ \ libc-dev \ python3-dev \ build-base \ cairo-dev \ …

python-3.x docker alpine-linux pymupdf

asked Jun 30 '22 at 14:59

Raiyan

1,589
1
14
28

vote

0 answers

How to press a button on a PDF form with Python?

I have a situation where I need to fill out a PDF form and then press a button in it (I need to press "Send" button and this sends the data to the system). From what I understand, pressing the button executes a JavaScript script on the form. I'm…

python pdf pymupdf

asked Jun 02 '22 at 10:45

Jerzy Głowacki

vote

0 answers

How does one get the exact coordinates of text after running PyMuPDF search for?

Suppose I run PyMuPDF's search for function: import fitz doc = fitz.Document(pdf_path) page = doc[pg] coords = page.search_for('foo', quads=True) First off, is this guaranteed to be the exact, minimal bounding rectangle of the text matched? -- I…

pymupdf

asked May 25 '22 at 22:58

Chris

28,822
27
83
158

vote

1 answer

How to Data Extract from Unstructured PDFs using PyMuPDF in python?

I am following this guide on how to extract data from Unstructured PDFs using PyMuPDF. https://www.analyticsvidhya.com/blog/2021/06/data-extraction-from-unstructured-pdfs/ I am getting an AttributeError: 'NoneType' object has no attribute 'rect'…

python dataframe pymupdf

asked May 09 '22 at 16:14

shuynh84

vote

1 answer

How to extract only certain table from the pdf (invoice) which contains multiple tables in the structure format

How to extract only one table from a pdf which contains multiple tables. I have tried using AmazonTextract but the problem is it gives me all the tables belonging to that pdf in a csv. But I need to extract only certain tables based on some…

pdf ocr pdftotext amazon-textract pymupdf

asked May 02 '22 at 11:37

Jyoti yadav

vote

1 answer

How to get a file path using tkinter askopenfilename or other command?

I'm building a simple app, where it converts pdf to png. When I use: pdf_name = askopenfilenames(initialdir="/", title="Selecionar Arquivos") I get: print(pdf_name) ('C:/Users/user/Desktop/Apps/Python/Conversor img to pdf/file.pdf',) So, the ask…

python tkinter pymupdf

asked Apr 24 '22 at 18:06

Paulo Roberto

vote

0 answers

Convert PDF to HTML via PyMuPDF

For pages with tabular data in landscape format, the words in the HTML outcome overlap. For pages in portrait formats, the conversion is succesful. Any ideas how to fix that? [Here is an example with the converted pdf to html in landscape…

python html pymupdf

asked Apr 09 '22 at 23:40

Nick Tsagkarakis

vote

1 answer

Case-sensitive PDF highlighting using PyMuPDF and re

The goal is a program that can take a PDF of a script as well as the name of a character and output a script with only that character's lines (or at least their name) highlighted. An example of the way these scripts are typically formatted: Here I…

python highlight python-re case-sensitive pymupdf

asked Feb 09 '22 at 18:45

deep_node

Prev 1 2 3

…

17 18 Next