Questions tagged [pymupdf]

PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.

PyMuPDF is a Python binding for – “a lightweight PDF and XPS viewer”.

can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats.

These are files with extensions .pdf, .xps, .oxps, .cbz, .fb2 or .epub (so you can develop e-book viewers in Python).

PyMuPDF provides access to many important functions of MuPDF from within a Python environment.

Note on the Name fitz:

The standard Python import statement for this library is import fitz. This has a historical reason.

257 questions
0
votes
0 answers

replace text in PDF files

I want to replace text in PDF files and found solutions but doesn't work. please help me... i using pymupdf's xref stream and get this BT 0 0 0 0 scn /C0_0 1 Tf 13.72 0 0 14 14.0156 76.1611 Tm <0D1804E10632>Tj /C0_1 1 Tf -0.01 Tc 0.01 Tw 7.84 0 0 8…
0
votes
0 answers

How can I remove the ® character from multiple multi-page PDFs using PyMuPDF for Camelot in Python?

How to remove an illegal character, (®) or editing text, in a PDF using Python, specifically PyMuPDF? I've been trying for hours to remove a trademark symbol ® from about a thousand multi-page PDFs so that I can scrape the tables data into a csv.…
J D
  • 11
  • 2
0
votes
0 answers

Extraction of position of an image in a PDF file

I am using pyMuPdf library to extract images from a pdf file. I want to get the position of the images (origin) and the size of them. I could get the sizes. However I can't get the position correctly using: def…
Toyo
  • 667
  • 1
  • 5
  • 22
0
votes
1 answer

Adding PDF values and export values to ComboBox using PyMuPDF

I am currently looking to set a face and export value to a PDF combobox using the good PyMuPDF module but I can't find the way. Normally, using Adobe API Javascript it would be something like this : f.setItems( ["Ohio", "OH"], ["Oregon", "OR"],…
Camilo
  • 335
  • 5
  • 7
0
votes
0 answers

Install PyMuPDF

I was installing requirements, where the error popped up ERROR: Failed to build wheels for PyMuPDF, which is required to install projects based on pyproject.toml I tried to install the PyMuPDF with python setup.py install, but then: Traceback…
0
votes
1 answer

How to use custom font for rendering epub doc in MuPDF?

My question seems too simple but I searched and read many codes without success. How can I use my font (non-builtin fonts) available to MuPDF in order to render EPUB document using that font? I tried to load my font as follow without success: //…
S.M.Mousavi
  • 5,013
  • 7
  • 44
  • 59
0
votes
0 answers

How to match placement,font and size of replaced text with search text in PDF files using Python?

I'm using Python and the PyMuPDF library to search for and replace text in PDF files. The code I have is able to successfully search for and replace the text, but the font and size of the replaced text is different from the search text. I want the…
Sik Saw
  • 33
  • 5
0
votes
1 answer

modifiy one element of namedtuple of list

I have written script to extract some information from pdf file. Each page is read as blocks. if [V2G has been found, then it will saved it as well as the title ,subtitle and the bulleted list. My code: data = [] req = namedtuple('Req', 'a b c d e…
user34088
  • 21
  • 4
0
votes
1 answer

Python Determine PDF Pages Containing Image

I get PDF from other Department with huge pages (like 1500). This PDF is compilation of subdistrict documents in a district. To make sure of this data, I want to extract data from this PDF. First try, I use PDFMiner to extract the text but this…
0
votes
0 answers

Get Metadata for each page from a batch PDF

I am trying to extract the page name (shown in the screenshot below) for each page from the batch PDF which has been produced from AutoCAD. I have tried PyMuPDF, PyPDF2 and PDFMiner but I can't seem to find where this info is stored in the PDF…
0
votes
1 answer

PyMuPdf (fitz) inaccessible in docker

I'm trying to get some OCR done in a docker file and since I couldn't get it to work with Tesseract I tried refactor to use PyMuPdf instead. The error I get is quite simple: File "/code/table.py", line 35, in import…
qoob
  • 137
  • 9
0
votes
0 answers

Detecting paragraphs in a PDF

How can I detect different "blocks" of text extracted from a PDF to split them into paragraphs? Could I try to use to use their position to do this? PyMuPDF only puts one newline character between the blocks, and also one newline after one of the…
Anm
  • 447
  • 4
  • 15
0
votes
1 answer

How can I get the font name in pdf file

I have written script to extract some information from pdf file. My code: for page in doc: rect = fitz.Rect(22, 52, 562,802) # crop page margins to ignore header, footer, left side blocks = page.get_text("blocks",rect,…
user34088
  • 21
  • 4
0
votes
1 answer

Python UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9c

I am trying to open a file with PyMuPDF, do some edits, and then return it to the frontend. Following there is the code @app.post('/return_pdf') async def return_pdf(uploaded_pdf: UploadFile): print("Filetype: ", type(uploaded_pdf)) #
Manuel
  • 3
  • 1
0
votes
0 answers

Error when filling PDF forms using 'fillpdf' library

When using the fillpdf library in Python to fill a PDF, the output PDF has the checkmarks for the radio buttons off center. Why is this? Template I'm using to fill out: Checkmarks are centered on radiobuttons Python Code: from fillpdf import…