Questions tagged [pymupdf]

PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.

PyMuPDF is a Python binding for mupdf – “a lightweight PDF and XPS viewer”.

mupdf can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats.

These are files with extensions .pdf, .xps, .oxps, .cbz, .fb2 or .epub (so you can develop e-book viewers in Python).

PyMuPDF provides access to many important functions of MuPDF from within a Python environment.

Note on the Name fitz:

The standard Python import statement for this library is import fitz. This has a historical reason.

257 questions

votes

1 answer

Identify the edited location in the PDF modified by online editor www.ilovepdf.com using Python

I have an SBI bank statement PDF which is tampered/forged. Here is the link for the PDF. This PDF is edited using online editor www.ilovepdf.com. The edited part is the first entry under the 'Credit' column. Original entry was '2,412.00' and I have…

python pdf pypdf pdfminer pymupdf

asked Feb 23 '21 at 00:27

Abhishek Tanksali

votes

1 answer

Python 3.7 else statement not showing correct index?

My goal here is to print lines from text files together. Some lines, however, are not together like they should be. I resolved the first problem where the denominator was on the line after. For the else statement, they all seem to have the same…

python pymupdf

asked Feb 10 '21 at 01:46

theanton205

votes

1 answer

Fields "Created" and "Modified" in Document Properties (PDF) were not displayed

Currently I have merged many PDFs together to create one PDF together. I have added metadata information which includes two fields "Created" and "Modified" but as a result these fields still do not display information. Here's my source code: import…

python python-3.x pymupdf python-pdfreader

asked Feb 03 '21 at 12:17

Thuấn Đào Minh

votes

1 answer

Create a pdf file, write in it and return its byte stream with PyMuPDF

Using PyMuPDF, I need to create a PDF file, write some text into it, and return its byte stream. This is the code I have, but it uses the filesystem to create and save the file: import fitz path = "PyMuPDF_test.pdf" doc = fitz.open() …

python python-3.x pymupdf

asked Jan 28 '21 at 21:05

Xar

7,572
19
56
80

votes

1 answer

Pymupdf getTextbox returns empty

I have tried to retrieve text in a rectangle. This rectangle is retrieved from Page.getLinks(). when I try to get the text in the rectangle using getTextbox() and getText(“text”, clip=rect). Both methods return Empty string

python pdf pymupdf

asked Jan 20 '21 at 18:59

Tejaalle

votes

0 answers

Attaching or stitching image piece at a particular position using python

I am extracting images from a given pdf file using python library PyMuPDF. The images that are constructed in a single layer they are being extracted perfectly. But Images which have been constructed using multiple layers they are being extracted in…

python image image-processing data-extraction pymupdf

asked Dec 15 '20 at 04:46

Sabster

votes

1 answer

Paragraph extraction in PyMuPDF

I'm using PyMuPDF to extract text from PDFs from block units. In many cases, "blocks" seem to just default to newline separated units, rather than logical paragraphs. import fitz doc = fitz.open("example.pdf") blocks = [x[4] for x in …

mupdf pdf-extraction pymupdf

asked Nov 06 '20 at 05:54

Guy De Pauw

votes

2 answers

GET table of contents from a PDF with python

I'm trying to get Table of Contents from a PDF. I'm using PyMuPDF for that purpose. But it only extracts ToC if the PDF consists of Bookmarks. Otherwise it only results in an empty list. def get_Table_Of_Contents(doc): toc = doc.getToC() …

python pdf text nlp pymupdf

asked Nov 05 '20 at 15:16

sheshank

votes

1 answer

Is there any way to identify crossed out words in PDF file while parsing it using Python?

I am parsing PDF file using PyMuPDF (great library by the way!) But I need to identify words, that are crossed out. Is there any way to do that?

python parsing pdf pymupdf

asked Aug 26 '20 at 08:08

Aleke Coder

votes

1 answer

Why is the MuPDF MediaBox of a page smaller than a contained image?

For this example PDF, I did this: import fitz doc = fitz.open("PDF-export-example-image-ocr.pdf") print(f"(1) {doc[0].bound()=}") print(f"(2) {doc[0].MediaBox=}") print(f"(3) {doc[0].getImageList()}") doc.close() which gives: (1)…

python pymupdf

asked Aug 20 '20 at 09:38

Martin Thoma

124,992
159
614
958

votes

1 answer

rotate PDF 90 degrees relative to current rotation

I have rotated a PDF using fitz by 90 degrees using this code: fitz_doc = fitz.open(origin, filetype="pdf") fitz_doc_name = f"{fitz_doc.name}.pdf" page = fitz_doc[int(0)] page.setRotation(90) fitz_doc.save(fitz_doc_name) fitz_doc.close() However,…

python python-3.x pymupdf

asked Jun 01 '20 at 22:44

kravb

votes

1 answer

Replacing Images with Image Names instead in Pdf using pymupdf

Using PyMuPDF, I want to extract all images from pdf and save them separately and replace all images in pdf with just their image names at the same image place and save as another document. I can save all images with following code. import…

python image pdf pdf-to-html pymupdf

asked May 23 '20 at 09:09

Mohammad Ahmed

votes

2 answers

Finding strings in PDF and highlight them using Python

I am trying to search strings in PDF and highlight them and save it using Python. The data file is an excel sheet(column 2) and contains special characters as well. I tried using PyMuPDF lib for this but its giving the below error: " Below is the…

python-3.x pdf highlight pymupdf

asked May 21 '20 at 05:53

Vir

votes

1 answer

Color issue when saving PDF page Pixmap as PNG using PyMuPDF

I'm running the following bit of Python code from the PyMuPDF 1.16.17 documentation, which save PNG images for every page in a PDF file. import sys, fitz # import the binding fname = "test.pdf" # get filename from command line doc =…

pdf cmyk pixmap pymupdf

asked Apr 21 '20 at 02:56

Leandro Nogueira Couto

votes

2 answers

Python PyMuPDF / Fitz rotates image from extractImage

I am pulling out embedded images from pdf pages using PyMuPDF / Fitz. This works great but some pdf files, but for certain ones the image is rotated 90 deg. I don't see any condition that could be used to correct this. Has anyone experienced this?…

python pdf pymupdf

asked Mar 03 '20 at 20:33

TChi

Prev 1 2 3

…

17 18 Next