Questions tagged [pymupdf]

PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.

PyMuPDF is a Python binding for mupdf – “a lightweight PDF and XPS viewer”.

mupdf can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats.

These are files with extensions .pdf, .xps, .oxps, .cbz, .fb2 or .epub (so you can develop e-book viewers in Python).

PyMuPDF provides access to many important functions of MuPDF from within a Python environment.

Note on the Name fitz:

The standard Python import statement for this library is import fitz. This has a historical reason.

257 questions

votes

0 answers

replace text in PDF files

I want to replace text in PDF files and found solutions but doesn't work. please help me... i using pymupdf's xref stream and get this BT 0 0 0 0 scn /C0_0 1 Tf 13.72 0 0 14 14.0156 76.1611 Tm <0D1804E10632>Tj /C0_1 1 Tf -0.01 Tc 0.01 Tw 7.84 0 0 8…

python pdf pymupdf

asked May 31 '23 at 08:08

KurtKim

votes

0 answers

How can I remove the ® character from multiple multi-page PDFs using PyMuPDF for Camelot in Python?

How to remove an illegal character, (®) or editing text, in a PDF using Python, specifically PyMuPDF? I've been trying for hours to remove a trademark symbol ® from about a thousand multi-page PDFs so that I can scrape the tables data into a csv.…

python pdf data-cleaning pymupdf python-camelot

asked May 24 '23 at 23:48

J D

votes

0 answers

Extraction of position of an image in a PDF file

I am using pyMuPdf library to extract images from a pdf file. I want to get the position of the images (origin) and the size of them. I could get the sizes. However I can't get the position correctly using: def…

python pymupdf

asked May 23 '23 at 13:03

Toyo

votes

1 answer

Adding PDF values and export values to ComboBox using PyMuPDF

I am currently looking to set a face and export value to a PDF combobox using the good PyMuPDF module but I can't find the way. Normally, using Adobe API Javascript it would be something like this : f.setItems( ["Ohio", "OH"], ["Oregon", "OR"],…

python pdf pymupdf

asked May 17 '23 at 19:00

Camilo

votes

0 answers

Install PyMuPDF

I was installing requirements, where the error popped up ERROR: Failed to build wheels for PyMuPDF, which is required to install projects based on pyproject.toml I tried to install the PyMuPDF with python setup.py install, but then: Traceback…

python installation pip pymupdf

asked May 10 '23 at 13:07

S_s_s_s_S

votes

1 answer

How to use custom font for rendering epub doc in MuPDF?

My question seems too simple but I searched and read many codes without success. How can I use my font (non-builtin fonts) available to MuPDF in order to render EPUB document using that font? I tried to load my font as follow without success: //…

c++ epub font-family mupdf pymupdf

asked May 03 '23 at 19:33

S.M.Mousavi

5,013
7
44
59

votes

0 answers

How to match placement,font and size of replaced text with search text in PDF files using Python?

I'm using Python and the PyMuPDF library to search for and replace text in PDF files. The code I have is able to successfully search for and replace the text, but the font and size of the replaced text is different from the search text. I want the…

python pdf pymupdf

asked Apr 29 '23 at 18:38

Sik Saw

votes

1 answer

modifiy one element of namedtuple of list

I have written script to extract some information from pdf file. Each page is read as blocks. if [V2G has been found, then it will saved it as well as the title ,subtitle and the bulleted list. My code: data = [] req = namedtuple('Req', 'a b c d e…

python namedtuple pymupdf

asked Apr 19 '23 at 17:20

user34088

votes

1 answer

Python Determine PDF Pages Containing Image

I get PDF from other Department with huge pages (like 1500). This PDF is compilation of subdistrict documents in a district. To make sure of this data, I want to extract data from this PDF. First try, I use PDFMiner to extract the text but this…

python-3.10 pymupdf

asked Apr 18 '23 at 13:14

Alfian Khusnul

votes

0 answers

Get Metadata for each page from a batch PDF

I am trying to extract the page name (shown in the screenshot below) for each page from the batch PDF which has been produced from AutoCAD. I have tried PyMuPDF, PyPDF2 and PDFMiner but I can't seem to find where this info is stored in the PDF…

python pypdf pdfminer pymupdf

asked Apr 18 '23 at 08:43

Stark Arpit

votes

1 answer

PyMuPdf (fitz) inaccessible in docker

I'm trying to get some OCR done in a docker file and since I couldn't get it to work with Tesseract I tried refactor to use PyMuPdf instead. The error I get is quite simple: File "/code/table.py", line 35, in import…

python docker tesseract python-tesseract pymupdf

asked Apr 15 '23 at 19:09

qoob

votes

0 answers

Detecting paragraphs in a PDF

How can I detect different "blocks" of text extracted from a PDF to split them into paragraphs? Could I try to use to use their position to do this? PyMuPDF only puts one newline character between the blocks, and also one newline after one of the…

python pdf pymupdf

asked Apr 15 '23 at 06:29

Anm

votes

1 answer

How can I get the font name in pdf file

I have written script to extract some information from pdf file. My code: for page in doc: rect = fitz.Rect(22, 52, 562,802) # crop page margins to ignore header, footer, left side blocks = page.get_text("blocks",rect,…

python pdf pymupdf

asked Apr 04 '23 at 17:20

user34088

votes

1 answer

Python UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9c

I am trying to open a file with PyMuPDF, do some edits, and then return it to the frontend. Following there is the code @app.post('/return_pdf') async def return_pdf(uploaded_pdf: UploadFile): print("Filetype: ", type(uploaded_pdf)) #

python python-3.x pymupdf

asked Apr 01 '23 at 16:28

Manuel

votes

0 answers

Error when filling PDF forms using 'fillpdf' library

When using the fillpdf library in Python to fill a PDF, the output PDF has the checkmarks for the radio buttons off center. Why is this? Template I'm using to fill out: Checkmarks are centered on radiobuttons Python Code: from fillpdf import…

python pdf pymupdf pdf-form pdfrw

asked Mar 29 '23 at 00:48

seanpinoobers

Prev 1 2 3

…

17 18 Next