Questions tagged [pymupdf]

PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.

PyMuPDF is a Python binding for – “a lightweight PDF and XPS viewer”.

can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats.

These are files with extensions .pdf, .xps, .oxps, .cbz, .fb2 or .epub (so you can develop e-book viewers in Python).

PyMuPDF provides access to many important functions of MuPDF from within a Python environment.

Note on the Name fitz:

The standard Python import statement for this library is import fitz. This has a historical reason.

257 questions
0
votes
1 answer

Find and mark words in a PDF EXCEPT some words python

I got this part of code: kwfile = fitz.open(filedialog.askopenfilename()) # the keywords PDF # the following extracts kwfile content as plain text across all pages: text = " ".join([page.get_text() for page in kwfile]) keywords =…
Furk276
  • 1
  • 1
0
votes
1 answer

How to extract only a Rect object in PyMuPDF

I tried the solution from this thread here: Read specific region from PDF Sadly the following example from the thread by user Zach Young doesn't work for me. import os.path import fitz from fitz import Document, Page, Rect # For visualizing the…
von spotz
  • 875
  • 7
  • 17
0
votes
1 answer

Python: How to sort a list of Rect objects?

I made a pdf reader that searches for a specific value and makes a list. I use PymuPDF which is incredible. So now I have this list and I would like to sort it with the following logic: first Rect is the top, left most Rect each following Rect is…
0
votes
1 answer

Opening PDF within a zip folder fitz.open()

I have a function that opens a zip file, finds a pdf with a given filename, then reads the first page of the pdf to get some specific text. My issue is that after I locate the correct file, I can't open it to read it. I have tried to use a relative…
Ryan
  • 25
  • 6
0
votes
0 answers

Pyside6 shearing PDF file on window resize

I'm using QT (PySide) to view PDFs (using the PyMuPDF library) but when I resize I get a shearing artifact. Like this: Here is a minimal example: import sys import fitz from PySide6.QtWidgets import QApplication, QLabel, QMainWindow,…
Matt Harrison
  • 1,225
  • 11
  • 12
0
votes
0 answers

Obtaining margin sizes of a pdf using PyMuPDF

Using PyMuPDF, is there any way to get the page margins? I mean the distance from the edge of the page to the nearest horizontal/vertical element, depending on whether it is left/right or top/bottom margin. Looking at the documentation I don't see…
Kikolo
  • 212
  • 1
  • 10
0
votes
1 answer

Using bezier curves to draw a rectangle with rounded corners in PyMuPDF

I would like to use PyMuPDF to draw a rectangle with rounded corners in a pdf. Apparently, there are no particular methods for rounded rectangles. But I was wondering if Shape.draw_bezier() or Shape.draw_curve() could be used for that purpose,…
Kikolo
  • 212
  • 1
  • 10
0
votes
1 answer

How can I disentangle seemingly different imported Python modules under the same version number?

I recently updated PyMuPDF/fitz and so updated my code that uses it to update my use of fitz methods to match the updated naming convention (see PyMuPDF > Deprecated Names). Problem: when I call a function I wrote to use fitz's Page.get_text() it…
0
votes
1 answer

Creating and then modifying pdf file in python

I am writing some code that merges some pdfs from their file paths and then writes some text on each page of the merged document. My problem is this: I can do both things separately - merge pdfs and write text to a pdf - I just cant seem to do it…
0
votes
1 answer

PyMuPDF get optimal font size given a rectangle

I am making an algorithm that performs certain edits to a PDF using the fitz module of PyMuPDF, more precisely inside widgets. The font size 0 has a weird behaviour, not fitting in the widget, so I thought of calculating the distance myself. But…
Clement Genninasca
  • 732
  • 1
  • 4
  • 14
0
votes
1 answer

pymupdf detect two paragraph which text blocks coordinates is closed as one

I face a problem that When I use fitz to detect pdf layout. The two paragraph will be detect as one textblock if the two block as a close line margin. for example. I want detect the text and the isolated formula as to text blocks. but for now fitz…
CAO RUI
  • 31
  • 3
0
votes
2 answers

Why does extracting file data in PyMuPDF give me empty lists?

I am new to programming (just do it for fun sometimes) and I am having trouble using PyMuPDF. In VS Code, it returns no errors but the output is always just an empty list. Here is the code: > import fitz file_path =…
0
votes
1 answer

python fitz page.add_highlight_annot(start=pointa, stop=pointb) not working

i'm trying to highlight a text in a pdf from a start word "pointa" to an end word "pointb" but it wont work it will mark all the text on the page Maybe some one could help me (pleas) and figure out what i'm doing wrong. import fitz import…
kalimero00
  • 21
  • 1
0
votes
2 answers

Is there an efficient way to executing a program with similar names using python in the terminal?

I'm trying to process PDFs using PyMuPDF and I'm running this python file called process_pdf.py in the terminal. > import sys, fitz > fname = sys.argv[1] # get document filename > doc = fitz.open(fname) # open document > out = open(fname + ".txt",…
0
votes
1 answer

In PyMuPDF what does the string of letters at the start of a Font name represent?

As can be seen in the documentation PyMuPDF get_page_fonts the returned set of fonts have names like FNUUTH+Calibri-Bold or DOKBTG+Calibri. What do the string prefixs (FNUUTH+, DOKBTG+) represent?
Tolure
  • 859
  • 1
  • 14
  • 34