Questions tagged [pymupdf]

PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.

PyMuPDF is a Python binding for – “a lightweight PDF and XPS viewer”.

can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats.

These are files with extensions .pdf, .xps, .oxps, .cbz, .fb2 or .epub (so you can develop e-book viewers in Python).

PyMuPDF provides access to many important functions of MuPDF from within a Python environment.

Note on the Name fitz:

The standard Python import statement for this library is import fitz. This has a historical reason.

257 questions
0
votes
0 answers

Python script stops with no error message

I am running quite a large script working through some pds documents using pymupdf, when i encounter an error no error message appear, the scrpt just stops running. When i trace the execution of the script it stops at quite random places. 2 times it…
Nursk
  • 1
  • 2
0
votes
0 answers

How to discard cropped information from a PDF (python pypdf2 pymupdf)

This issue arises because the entire PDF file cannot be disclosed to the client. After I cut off the pdf, however this seems to just hide as the size is still the same and the cutout part can still be called out First of all, I decided to share the…
Ray
  • 1
  • 2
0
votes
0 answers

RunTimeError, make program crash with no error with try-except clause

I am using Fitz in python for working with pdf documents, one document i have sometimes get an RunTimeError and other times it doesnt when i iterate over the pages. When i apply a try-except clause for it, the program just stops when it encounters…
Nursk
  • 1
  • 2
0
votes
1 answer

Can't take bulletpoints from PDF using python fitz

I'm trying to take all data from PDFs. I also want to identify bulletpoints in the PDF, but that time, i'm getting bulletpoints that when i manually copy from the PDF and paste somewhere else it just paste the string (image.png) for all the…
Santiago
  • 25
  • 3
0
votes
0 answers

Get text based on coordinates as same format as in PDF

I have coordinate details, but I'm unable to find any method in pymupdf to fetch a block of data based on the coordinates. Is there any method in pymupdf that can do this? I'm open to other libraries, though I already used PDFQuery which is not…
m9m9m
  • 1,655
  • 3
  • 21
  • 41
0
votes
0 answers

Highlight a paragraph in PDF using Fitz

I am trying to use Fitz to highlight text in a PDF document. I can highlight an individual word quite easily, but I am trying to highlight the whole paragraph that the word appeared in. Is this possible using fitz? I cannot find any information on…
Jim
  • 23
  • 5
0
votes
0 answers

Speed Performance of pytesseract on PDF (comparing to existing pdf-ocr library in python)

I am a beginner on OCR projects and currently looking into different ways in python to get the OCR-ed text in pdf. One simple and popular way seems to be the pytesseract library by converting the pdf file into png /jpg first. I also try libraries…
Yukititit
  • 3
  • 2
0
votes
0 answers

Simulate Overprint with PyMuPDF

So I'm familiar with Python but not exactly an expert. What I've been doing is looking into tools that I can use to convert an existing PDF into a PNG of higher quality/zoom, draw the trim & bleed boxes, and simulate overprinting. I want to make…
Xanodus
  • 1
  • 3
0
votes
0 answers

Avoid creation of a temporary file due to write restrictions on web servers

I have a code that uses two independent packages (let's call them packageA and packageB). PackageA has a function write(outputPath: str, ...) that writes a ".pdf" to disk of some data. PackageB includes a method called read(inputPath: str, ...) that…
Kikolo
  • 212
  • 1
  • 10
0
votes
1 answer

cannot write mode PA as PNG

pdf_file=fitz.open(r"C:\Users\user\Downloads\example.pdf") for page_index in range(len(pdf_file)): page=pdf_file[page_index] print(page.get_pixmap()) OSError: cannot write mode PA as PNG How i can get images from pdf file…
0
votes
1 answer

How to detect drawings and get their size from a pdf using python?

Basically I want to detect and get the bounding box of the figures or drawings which are in pdf using python, enter image description here As per the image I just want the bounding box of the figure right below the question, but it also detects the…
0
votes
1 answer

Reading a pdf in AWS lambda using PyMuPDF

I am trying to read a pdf in AWS lambda. The pdf is stored in an s3 bucket. I need to extract the text from pdf and translate them into any required language. I am able to run my code in my notebook but when I run it on Lambda I get this error…
0
votes
1 answer

Reading stream as image in PDF file with pyMuPDF

I want to read the infos (width, height and DPI) from an image embedded in a PDF file with only one page. Im using pyMuPDF: import fitz pdf_file = fitz.open(filepath) for page in pdf_file: images = page.get_images() # returns an empty list []…
0
votes
0 answers

Cannot read an image inside a PDF using PyMUpdf and pytesseract

This is my code: import fitz from PIL import Image import pytesseract # Open the PDF file using PyMuPDF pdf_file = fitz.open("file") # Iterate through all the pages in the PDF text_list = [] for page_number in range(pdf_file.page_count): …
0
votes
1 answer

highlight Text in pdf file without using search_for()

I would like to highlight text in my pdf file by using PyMuPDF library. The method search_for() return the location of the searched words. the problem is this method ignore spaces. Upper / lower case.it works only for ASCII characters. is there any…
user34088
  • 21
  • 4