Questions tagged [pymupdf]

PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.

PyMuPDF is a Python binding for mupdf – “a lightweight PDF and XPS viewer”.

mupdf can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats.

These are files with extensions .pdf, .xps, .oxps, .cbz, .fb2 or .epub (so you can develop e-book viewers in Python).

PyMuPDF provides access to many important functions of MuPDF from within a Python environment.

Note on the Name fitz:

The standard Python import statement for this library is import fitz. This has a historical reason.

257 questions

votes

1 answer

How to get background color of a Text in PyMuPDF

Am trying to see if I can identify possible table headers in a table inside PDF using background and foreground color of the text. With PyMuPDF text extraction, I was able to get the foreground color. Wondering if there is a way to get background…

python pdf-extraction pymupdf

asked Sep 26 '19 at 06:30

Suvin K S

votes

1 answer

ways to separate passages in pdf using gap?

I have some pdf's with 2-3 passages for every page. every passage is separated by some line gap, but while reading with pymupdf, I cannot see any machine printable separator between passages. is there any other way, other library can do…

pdf pdfminer pdftotext pymupdf pdfium

asked Sep 02 '22 at 09:24

Saivenkataraju

votes

0 answers

How to use fitz (PyMuPDF) with py2app or pyinstaller [ModuleNotFoundError]?

I want to convert my python script which contains a pdf to image converter to a .app file on MacOS, and be able to run this on a different machine. I have tried both pyinstaller and py2app and get the following error message: Traceback (most recent…

python macos py2app pymupdf

asked Feb 10 '22 at 17:12

Bryan_Koh

votes

1 answer

Why can't i extract correctly the image from this pdf? [Please need help]

I am currently working on OCR on pdf files. Here is my pipeline: i first extract image from pdf (since my pdf contained scanned document) and convert in numpy array then i read with tesseract It works pretty well on most of my image but i have…

python-3.x pymupdf

asked Nov 18 '20 at 10:35

curious

votes

0 answers

Capture screenshot from pdf page

I have a pdf document and this page has an image of a graph plot, however legend of the plot is not part of the image. I am using pymupdf to extract get this image as following: for img in doc.getPageImageList(page_num, full=True): xref =…

python pymupdf

asked Oct 15 '20 at 12:04

CuriousBug

votes

4 answers

Convert PDF file to multipage image

I'm trying to convert a multipage PDF file to image with PyMuPDF: pdffile = "input.pdf" doc = fitz.open(pdffile) page = doc.loadPage() # number of page pix = page.getPixmap() output = "output.tif" pix.writePNG(output) But I need to convert all the…

python image pdf pymupdf

asked Aug 30 '20 at 20:38

David Delos

votes

0 answers

Decoding problem with fitz.Document in Python 3.7

I want to extract the text of a PDF and use some regular expressions to filter for information. I am coding in Python 3.7.4 using fitz for parsing the pdf. The PDF is written in German. My code looks as follows: doc = fitz.open(pdfpath) pagecount =…

python pymupdf text-decoding

asked Aug 21 '20 at 08:58

Riprip

votes

3 answers

adding text to a pdf using PyMuPDF

I'm trying to add text to a pdf by opening the PDF, adding a text box, and saving it. When I run the code, nothing happens. on the desktop, it shows the file has been updated, but there is no text displayed on it. Here's the code: import fitz doc =…

python pymupdf

asked Aug 05 '20 at 05:51

Khayla Black

votes

2 answers

Can't read the content of a certain page of a pdf file available online

I've used PyMuPDF library to parse the content of any specific page of a pdf file locally and found it working. However, when I try to apply the same logic while parsing the content of any specific page of a pdf file available online, I encounter an…

python python-3.x pdf web-scraping pymupdf

asked Aug 16 '19 at 20:50

MITHU

votes

2 answers

I am having an import error with the fitz library in PyCharm

I am having this issue of importing the fitz library in PyCharm. I pip installed PyMuPDF and in my code I added "import fitz" but it is giving me this error: ImportError:…

python python-3.x pycharm pymupdf

asked Aug 26 '23 at 22:23

jjboi8708

votes

1 answer

Keywords being highlighted in wrong color using PyMuPDF

I'm doing some basic keyword highlighting, but I'm running into a strange issue. When I enter a stroke color with floating point RGB values (as shown below), the highlights come out in multiple different colors. In this case, I want the highlights…

python pdf pymupdf

asked Jun 22 '23 at 01:16

almosthavoc

votes

1 answer

RTL (Arabic) ligatures problem when extracting text from PDF

When extracting Arabic text from a PDF file using librairies like PyMuPDF or PDFMiner, the words are returned in backward order which is a normal behavior for RTL languages, and you need to use bidi algorithm to be able to display it correctly…

python pdfminer pymupdf bidi pdf-extraction

asked Jan 30 '23 at 03:41

Naourass Derouichi

votes

1 answer

How to add a border to hyperlink with Fitz module?

I spent three hours experimenting this morning on this but I can't manage to make the border visible on a hyperlink within a pdf annotated with the python FITZ module. Any idea ? Thanks so much in advance ! import fitz doc =…

python pymupdf

asked Jul 22 '22 at 11:46

angelo95210

votes

0 answers

Data Wrangling of text extracted from PDF using PyMuPDF possible? (alternating colors for each row) - text positioned in the middle for each row

I extracted data from PDF file. I am sharing a sample of the page here. I extracted data from the PDF using Tabula-py. These are the arguments I used to extract the text from PDF page. import numpy as np import pandas as pd from tabula.io import…

python pandas data-wrangling pymupdf tabula-py

asked Jul 03 '22 at 10:35

Joe

votes

0 answers

How to spread text on multiple pages depending on text size?

What I tried doc = fitz.open() page = doc.new_page() text = 'Long text' tw = fitz.TextWriter(page.rect) tw.append((20,40), text, small_caps=True) tw.write_text(page) doc.ez_save('test.pdf') How to spread text on multiple pages depending on text…

python pymupdf

asked Jul 02 '22 at 10:29

Zurechtweiser

1,165
2
16
29

Prev 1

…

17 18 Next