Questions tagged [pymupdf]

PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.

PyMuPDF is a Python binding for mupdf – “a lightweight PDF and XPS viewer”.

mupdf can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats.

These are files with extensions .pdf, .xps, .oxps, .cbz, .fb2 or .epub (so you can develop e-book viewers in Python).

PyMuPDF provides access to many important functions of MuPDF from within a Python environment.

Note on the Name fitz:

The standard Python import statement for this library is import fitz. This has a historical reason.

257 questions

vote

1 answer

Crop an area of pdf around annotated text using Fitz

Problem Statement Reading pdf and search for a word. If word found, annotate the word and get an area cropped around the annotated text from the pdf file. Each cropped image should only have one annotation. Libraries and…

python pdf pymupdf

asked Aug 15 '20 at 17:27

Jacob Lawrence

vote

2 answers

why saving a file that I opened with fitz will change its size?

I looked for what opening a file with fitz do to the file, but didn't find anything. The code is simple: import fitz doc = fitz.open('a.pdf') doc.save('b.pdf') What I don't understand is why this will change the pdf size. With the file I tried, its…

python pdf metadata filesize pymupdf

asked Jun 23 '20 at 00:44

José Chamorro

vote

3 answers

PyMuPDF insertTextBox inserting text but in mirrored form

import fitz text_rectangle = fitz.Rect(450,20,550,120) file_handle = fitz.open(input_file) first_page = file_handle[0] text = 'SAS Automation' first_page.insertTextbox(text_rectangle, f'{text}') file_handle.save(output_file) Above code adds text in…

python pymupdf

asked Jun 18 '20 at 07:42

Liyakat Shaikh

vote

1 answer

Can a text be searched Blockwise in a PDF using PyMuPDF?

page.getTextBlocks() Output [(42.5, 86.45002746582031, 523.260009765625, 100.22002410888672, TEXT, 0, 0), (65.75, 103.4000244140625, 266.780029296875, 159.59010314941406, TEXT, 1, 0), (48.5, 86.123456, 438.292048492, 100.92920404974, TEXT, 0,…

python pdf text-search pymupdf

asked Jun 11 '20 at 10:26

Lav Mehta

vote

2 answers

Is thre any solution to extract borderless table from PDF to CSV?

This is my example image from pdf file with 75 pages.

python tabula pymupdf

asked Jun 08 '20 at 07:49

DataEngineer_Developer

vote

3 answers

PyMuPDF how do I remove annotations?

I am using PyMuPDF and trying to loop through a list of strings and highlight them before taking an image and moving to the next string. The code below does what I need but the annotation remains after each loop and I would like to remove them…

python-3.x pymupdf

asked May 22 '20 at 04:27

ajcnzd

vote

0 answers

How can I correctly add the alpha channel to an image extracted from a PDF using PyMuPDF

I am trying to extract images from a PDF using PyMuPDF and this recipe. For some images with a hard edge transparency it works. But for images with a matte transparency, I get artifacts along the edges. When I extract the image without alpha…

python-3.x png pymupdf

asked Apr 17 '20 at 13:07

Simon

vote

3 answers

PyMuPDF insert image at bottom

I'm trying to read a PDF and insert an image at bottom(Footer) of each page in PDF. I've tried with PyMuPDF library. Problem: Whatever the Rect (height, width) I give, it doesn't appear in the bottom, image keeps appearing only on the top half of…

python-3.x pdf pdf-generation pymupdf

asked Mar 16 '20 at 18:30

Rohit Nimmala

1,459
10
28

vote

2 answers

How to Install PyMuPDF on Heroku Django

I am trying to make a script that extracts Images from PDF and I have made a script in a Django Project and added pymupdf to the requirements.txt.I Have an Aptfile with Mupdf in it and https://github.com/heroku/heroku-buildpack-apt as a buildpack…

python django pdf heroku pymupdf

asked Feb 10 '20 at 12:57

Raghav Saraf

vote

1 answer

Problem regarding highlighting text in pdf document python

I am trying to write a python script that would automate the process of finding text in a pdf and highlight according I am using pymupdf module of python. It works for some pdf. However, when for the target pdf(drawing of components and property…

python pdf annotations pymupdf

asked Nov 07 '19 at 23:00

user12140050

vote

0 answers

Tkinter Canvas PDF Viewer Next Page Render Works Only When Debugging

I am trying to write a PDF viewer in Python/Tkinter using the PyMuPDF library. I can successfully open the document and render the first page, but when attempting to move to the next page by deleting the Canvas image and creating a new one from the…

python python-3.x tkinter tkinter-canvas pymupdf

asked Aug 04 '19 at 15:34

PercyODI

vote

1 answer

Python PyMuPDF Fitz insertImage

Have been trying to put an image into a PDF file using PyMuPDF / Fitz and everywhere I look on the internet I get the same syntax, but when I use it I'm getting a runtime error. >>> doc = fitz.open("NewPDF.pdf") >>> page = doc[1] >>> rect =…

python image pdf-generation jpeg pymupdf

asked Mar 17 '18 at 18:32

AlexJ

votes

0 answers

Extract details from unstructured pdfs either in table or any other format

I tried to extract grant payable org details from PDFs which have fixed format but page numbers are varying. I have spent a lot of time with libraries like PYPDF2, PyMuPDF, Tabula, SpaCy, NLTK, etc. but still no luck. It will be a great help if…

python html pdf pypdf pymupdf

asked Sep 01 '23 at 06:05

Umesh Kumar

votes

1 answer

Correctly extract PDF within PDF - Python

I have a PDF embedded on a PDF. I've tried multiple ways of extracting it, but when I save it I get back the same original PDF (With the embedded one). I only want to get the embedded PDF. I'm open to do it in another programming language, the only…

python pdf pypdf pymupdf

asked Aug 30 '23 at 15:53

ilia

votes

1 answer

Remove the garbage words from the pdf

I am extracting the pdf to text using python and libraries like, fitz, pdfreader and so on. But in my pdf, there are some schematics and words I do not need on it. Here is an example. When extracting the text, the words of the schematics are also…

python pdf pdf-reader pymupdf pdfplumber

asked Aug 30 '23 at 10:22

Muhammad Samadzade

Prev 1 2 3

…

17 18 Next