Questions tagged [pymupdf]

PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.

PyMuPDF is a Python binding for – “a lightweight PDF and XPS viewer”.

can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats.

These are files with extensions .pdf, .xps, .oxps, .cbz, .fb2 or .epub (so you can develop e-book viewers in Python).

PyMuPDF provides access to many important functions of MuPDF from within a Python environment.

Note on the Name fitz:

The standard Python import statement for this library is import fitz. This has a historical reason.

257 questions
0
votes
1 answer

Identify the edited location in the PDF modified by online editor www.ilovepdf.com using Python

I have an SBI bank statement PDF which is tampered/forged. Here is the link for the PDF. This PDF is edited using online editor www.ilovepdf.com. The edited part is the first entry under the 'Credit' column. Original entry was '2,412.00' and I have…
0
votes
1 answer

Python 3.7 else statement not showing correct index?

My goal here is to print lines from text files together. Some lines, however, are not together like they should be. I resolved the first problem where the denominator was on the line after. For the else statement, they all seem to have the same…
0
votes
1 answer

Fields "Created" and "Modified" in Document Properties (PDF) were not displayed

Currently I have merged many PDFs together to create one PDF together. I have added metadata information which includes two fields "Created" and "Modified" but as a result these fields still do not display information. Here's my source code: import…
0
votes
1 answer

Create a pdf file, write in it and return its byte stream with PyMuPDF

Using PyMuPDF, I need to create a PDF file, write some text into it, and return its byte stream. This is the code I have, but it uses the filesystem to create and save the file: import fitz path = "PyMuPDF_test.pdf" doc = fitz.open() …
Xar
  • 7,572
  • 19
  • 56
  • 80
0
votes
1 answer

Pymupdf getTextbox returns empty

I have tried to retrieve text in a rectangle. This rectangle is retrieved from Page.getLinks(). when I try to get the text in the rectangle using getTextbox() and getText(“text”, clip=rect). Both methods return Empty string
Tejaalle
  • 11
  • 3
0
votes
0 answers

Attaching or stitching image piece at a particular position using python

I am extracting images from a given pdf file using python library PyMuPDF. The images that are constructed in a single layer they are being extracted perfectly. But Images which have been constructed using multiple layers they are being extracted in…
Sabster
  • 89
  • 1
  • 12
0
votes
1 answer

Paragraph extraction in PyMuPDF

I'm using PyMuPDF to extract text from PDFs from block units. In many cases, "blocks" seem to just default to newline separated units, rather than logical paragraphs. import fitz doc = fitz.open("example.pdf") blocks = [x[4] for x in …
Guy De Pauw
  • 3
  • 2
  • 3
0
votes
2 answers

GET table of contents from a PDF with python

I'm trying to get Table of Contents from a PDF. I'm using PyMuPDF for that purpose. But it only extracts ToC if the PDF consists of Bookmarks. Otherwise it only results in an empty list. def get_Table_Of_Contents(doc): toc = doc.getToC() …
sheshank
  • 29
  • 4
0
votes
1 answer

Is there any way to identify crossed out words in PDF file while parsing it using Python?

I am parsing PDF file using PyMuPDF (great library by the way!) But I need to identify words, that are crossed out. Is there any way to do that?
0
votes
1 answer

Why is the MuPDF MediaBox of a page smaller than a contained image?

For this example PDF, I did this: import fitz doc = fitz.open("PDF-export-example-image-ocr.pdf") print(f"(1) {doc[0].bound()=}") print(f"(2) {doc[0].MediaBox=}") print(f"(3) {doc[0].getImageList()}") doc.close() which gives: (1)…
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
0
votes
1 answer

rotate PDF 90 degrees relative to current rotation

I have rotated a PDF using fitz by 90 degrees using this code: fitz_doc = fitz.open(origin, filetype="pdf") fitz_doc_name = f"{fitz_doc.name}.pdf" page = fitz_doc[int(0)] page.setRotation(90) fitz_doc.save(fitz_doc_name) fitz_doc.close() However,…
kravb
  • 417
  • 4
  • 16
0
votes
1 answer

Replacing Images with Image Names instead in Pdf using pymupdf

Using PyMuPDF, I want to extract all images from pdf and save them separately and replace all images in pdf with just their image names at the same image place and save as another document. I can save all images with following code. import…
Mohammad Ahmed
  • 57
  • 1
  • 1
  • 6
0
votes
2 answers

Finding strings in PDF and highlight them using Python

I am trying to search strings in PDF and highlight them and save it using Python. The data file is an excel sheet(column 2) and contains special characters as well. I tried using PyMuPDF lib for this but its giving the below error: " Below is the…
Vir
  • 41
  • 1
  • 4
0
votes
1 answer

Color issue when saving PDF page Pixmap as PNG using PyMuPDF

I'm running the following bit of Python code from the PyMuPDF 1.16.17 documentation, which save PNG images for every page in a PDF file. import sys, fitz # import the binding fname = "test.pdf" # get filename from command line doc =…
0
votes
2 answers

Python PyMuPDF / Fitz rotates image from extractImage

I am pulling out embedded images from pdf pages using PyMuPDF / Fitz. This works great but some pdf files, but for certain ones the image is rotated 90 deg. I don't see any condition that could be used to correct this. Has anyone experienced this?…
TChi
  • 383
  • 1
  • 6
  • 14