Questions tagged [pymupdf]

PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.

PyMuPDF is a Python binding for mupdf – “a lightweight PDF and XPS viewer”.

mupdf can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats.

These are files with extensions .pdf, .xps, .oxps, .cbz, .fb2 or .epub (so you can develop e-book viewers in Python).

PyMuPDF provides access to many important functions of MuPDF from within a Python environment.

Note on the Name fitz:

The standard Python import statement for this library is import fitz. This has a historical reason.

257 questions

votes

0 answers

Python script stops with no error message

I am running quite a large script working through some pds documents using pymupdf, when i encounter an error no error message appear, the scrpt just stops running. When i trace the execution of the script it stops at quite random places. 2 times it…

python pymupdf

asked Mar 23 '23 at 14:48

Nursk

votes

0 answers

How to discard cropped information from a PDF (python pypdf2 pymupdf)

This issue arises because the entire PDF file cannot be disclosed to the client. After I cut off the pdf, however this seems to just hide as the size is still the same and the cutout part can still be called out First of all, I decided to share the…

python pypdf pymupdf

asked Mar 23 '23 at 05:24

Ray

votes

0 answers

RunTimeError, make program crash with no error with try-except clause

I am using Fitz in python for working with pdf documents, one document i have sometimes get an RunTimeError and other times it doesnt when i iterate over the pages. When i apply a try-except clause for it, the program just stops when it encounters…

python pdf runtime-error pymupdf

asked Mar 22 '23 at 09:53

Nursk

votes

1 answer

Can't take bulletpoints from PDF using python fitz

I'm trying to take all data from PDFs. I also want to identify bulletpoints in the PDF, but that time, i'm getting bulletpoints that when i manually copy from the PDF and paste somewhere else it just paste the string (image.png) for all the…

python text pymupdf

asked Mar 15 '23 at 13:51

Santiago

votes

0 answers

Get text based on coordinates as same format as in PDF

I have coordinate details, but I'm unable to find any method in pymupdf to fetch a block of data based on the coordinates. Is there any method in pymupdf that can do this? I'm open to other libraries, though I already used PDFQuery which is not…

python coordinates pymupdf

asked Mar 13 '23 at 13:36

m9m9m

1,655
3
21
41

votes

0 answers

Highlight a paragraph in PDF using Fitz

I am trying to use Fitz to highlight text in a PDF document. I can highlight an individual word quite easily, but I am trying to highlight the whole paragraph that the word appeared in. Is this possible using fitz? I cannot find any information on…

python pymupdf

asked Mar 06 '23 at 19:01

Jim

votes

0 answers

Speed Performance of pytesseract on PDF (comparing to existing pdf-ocr library in python)

I am a beginner on OCR projects and currently looking into different ways in python to get the OCR-ed text in pdf. One simple and popular way seems to be the pytesseract library by converting the pdf file into png /jpg first. I also try libraries…

pdf ocr tesseract python-tesseract pymupdf

asked Mar 06 '23 at 05:35

Yukititit

votes

0 answers

Simulate Overprint with PyMuPDF

So I'm familiar with Python but not exactly an expert. What I've been doing is looking into tools that I can use to convert an existing PDF into a PNG of higher quality/zoom, draw the trim & bleed boxes, and simulate overprinting. I want to make…

python pdf pymupdf

asked Mar 01 '23 at 20:31

Xanodus

votes

0 answers

Avoid creation of a temporary file due to write restrictions on web servers

I have a code that uses two independent packages (let's call them packageA and packageB). PackageA has a function write(outputPath: str, ...) that writes a ".pdf" to disk of some data. PackageB includes a method called read(inputPath: str, ...) that…

python-3.x web-services io file-writing pymupdf

asked Feb 24 '23 at 01:13

Kikolo

votes

1 answer

cannot write mode PA as PNG

pdf_file=fitz.open(r"C:\Users\user\Downloads\example.pdf") for page_index in range(len(pdf_file)): page=pdf_file[page_index] print(page.get_pixmap()) OSError: cannot write mode PA as PNG How i can get images from pdf file…

python pdf pymupdf

asked Feb 23 '23 at 10:47

YAŞAR EMRE DOĞRU

votes

1 answer

How to detect drawings and get their size from a pdf using python?

Basically I want to detect and get the bounding box of the figures or drawings which are in pdf using python, enter image description here As per the image I just want the bounding box of the figure right below the question, but it also detects the…

python python-3.x opencv pymupdf pdfpages

asked Feb 22 '23 at 06:38

Shivam Tripathi

votes

1 answer

Reading a pdf in AWS lambda using PyMuPDF

I am trying to read a pdf in AWS lambda. The pdf is stored in an s3 bucket. I need to extract the text from pdf and translate them into any required language. I am able to run my code in my notebook but when I run it on Lambda I get this error…

python-3.x amazon-web-services aws-lambda aws-lambda-layers pymupdf

asked Feb 19 '23 at 18:05

self.Fool

votes

1 answer

Reading stream as image in PDF file with pyMuPDF

I want to read the infos (width, height and DPI) from an image embedded in a PDF file with only one page. Im using pyMuPDF: import fitz pdf_file = fitz.open(filepath) for page in pdf_file: images = page.get_images() # returns an empty list []…

python-3.x pymupdf

asked Feb 15 '23 at 14:58

Márcio Duarte

votes

0 answers

Cannot read an image inside a PDF using PyMUpdf and pytesseract

This is my code: import fitz from PIL import Image import pytesseract # Open the PDF file using PyMuPDF pdf_file = fitz.open("file") # Iterate through all the pages in the PDF text_list = [] for page_number in range(pdf_file.page_count): …

python python-imaging-library ocr python-tesseract pymupdf

asked Feb 10 '23 at 19:31

Diego Kenny

votes

1 answer

highlight Text in pdf file without using search_for()

I would like to highlight text in my pdf file by using PyMuPDF library. The method search_for() return the location of the searched words. the problem is this method ignore spaces. Upper / lower case.it works only for ASCII characters. is there any…

python highlight pymupdf

asked Feb 09 '23 at 20:35

user34088

Prev 1 2 3

…

17 18 Next