Questions tagged [pymupdf]

PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.

PyMuPDF is a Python binding for – “a lightweight PDF and XPS viewer”.

can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats.

These are files with extensions .pdf, .xps, .oxps, .cbz, .fb2 or .epub (so you can develop e-book viewers in Python).

PyMuPDF provides access to many important functions of MuPDF from within a Python environment.

Note on the Name fitz:

The standard Python import statement for this library is import fitz. This has a historical reason.

257 questions
0
votes
1 answer

How to save numpy array image to a page of pdf by using pymupdf?

doc = fitz.open() pdf = fitz.open("in.pdf") for page in pdf: pix = page.get_pixmap(matrix=fitz.Matrix(7, 7)) im = Image.frombytes("RGB", [pix.width, pix.height], pix.samples) im = cvtColor(array(im), COLOR_RGB2GRAY) page =…
R.Q Luo
  • 1
  • 1
0
votes
1 answer

How to get "Fast Web View" property value from pdf using python or any other source?

Is there a way to extract Fast Web View property value programmatically? Python would be preferred. Thanks Manohar
Manohar KM
  • 11
  • 1
0
votes
1 answer

Problems extracting files from a pdf with PyM

I want to extract and save images as .png, from a pdf file. I use the following Python code and PyMuPDF: import fitz import io from PIL import Image file = "pdf1.pdf" pdf_file = fitz.open(file) for page_index in range(len(pdf_file)): page =…
Erwin
  • 19
  • 4
0
votes
1 answer

Python: Show a PyMuPDF document with Flask

This is a very general question, yet I don't seem to find any answer anywhere. I have a Python program that manipulates documents using the PyMuPDF library, and then would like to show them through Flask on a html tag. The data argument…
Clement Genninasca
  • 732
  • 1
  • 4
  • 14
0
votes
2 answers

Converting PDF to an image using PyMuPDF

I have attempted to use PyMuPDF to covert a PDF document to an image, so that I can use it in openCV. However I have an attribute error come up when I try to save the image and I'm not sure how to get around this? import fitz pdf =…
0
votes
1 answer

how to delete a text layer using fitz?

This is a very straightforward issue. I added an invisible text layer using page.insert_text(). After saving the modified pdf, I can use page.get_text() to retrieve the created text layer. I would like to be able to eliminate that layer, buy…
José Chamorro
  • 497
  • 1
  • 6
  • 21
0
votes
2 answers

Is a loop function a solution to this problem

I have the following code, taken and adapted from the Collection of Recipes of PyMuPdf. import fitz # the document to annotate doc = fitz.open("test3.pdf") # the text to be marked t = "lidiar con estas problemáticas" # work with first page…
Ramiro
  • 49
  • 11
0
votes
1 answer

FITZ insert_text "compressing" text layer in the bottom-left side of the pdf page

I've been struggling with this issue for a while now and I just don't know what's going on. My code is as messy as an amateur code should be, but it usually works (except when it doesn't). The code bellow converts an ordinary pdf file into an ocr…
José Chamorro
  • 497
  • 1
  • 6
  • 21
0
votes
2 answers

extract the specific text from pdfs using python

I have tried different python libraries to extract the specific text from pdfs, I have to extract text under the heading pdf1 from this pdf, I have to extract the text starting from Case 1 to diamond ◆ bold. The next pdf contains the data in a…
0
votes
0 answers

How to export underline texts in pdf using PyMuPDF?

I have a pdf document that have section numbers like 4.1.1 that are underlined. How would I go about using PyMuPDF to extract texts that are underlined?
shuynh84
  • 59
  • 8
0
votes
1 answer

How to identify strike-out text from PDF files using Python

I would like to extract only the strike-out text from a .pdf file. I have tried the below code, it is working with a sample pdf file I have. But it is not working with another pdf file which I think is a scanned one. Is there any standard way to…
0
votes
1 answer

For loop is getting its count lost somewhere

Sorry if it is at all confusing I am very new to python and am trying to get my foot into the industry by automating simple tasks at the company I work for. This is a for loop designed to pull a specifically labeled page out of a pdf page matching…
0
votes
0 answers

Transform text contents of a PDF

I have a PDF with multiple text blocks which are misaligned. I am trying to generate a new PDF with aligned text as per my transformation matrix (known). I can use PyMuPDF (fitz) to extract the text information from the source PDF and insert the…
asymptote
  • 1,133
  • 8
  • 15
0
votes
1 answer

Extract Text from PDF using PyMuPDF

I am trying to extract text from a specific portion of a PDF file. From what I've found it sounds like PyMuPDF is the best option, and the below code came from the project's documentation. The problem is that the text that is extracted is not from…
jhoop2002
  • 33
  • 2
  • 8
0
votes
0 answers

Using PyMuPDF, returned fitz.Document object cannot be opened because "not picklable". Any recs?

I'm trying to read in a pdf and get the text from it. I'm new to using PyMuPDF, but I did follow code I saw online pretty much line for line. However, when I read in the document, I get a fitz.Document object that cannot be opened. Spyder returns…