Questions tagged [pymupdf]

PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.

PyMuPDF is a Python binding for mupdf – “a lightweight PDF and XPS viewer”.

mupdf can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats.

These are files with extensions .pdf, .xps, .oxps, .cbz, .fb2 or .epub (so you can develop e-book viewers in Python).

PyMuPDF provides access to many important functions of MuPDF from within a Python environment.

Note on the Name fitz:

The standard Python import statement for this library is import fitz. This has a historical reason.

257 questions

votes

1 answer

Python scraping an unstructured PDF

We get bi weekly software releases from a supplier who provides us with PDF release notes. The notes have got a lot of irrelevant stuff in them, but ultimately we need to go and manually copy/paste information from these notes into a Confluence…

python python-3.x pandas pymupdf

asked Aug 31 '20 at 12:30

Isaac

votes

1 answer

Extracting complete hyperlink string from PDF using PyMuPDF

I'm trying to extract every single link from a PDF. I'm able to get every single hyperlink using this code: folder = "test_folder" folder_data = [os.path.join(dp, f) for dp, dn, filenames in os.walk(folder) for f in filenames if…

python pdf pymupdf

asked Mar 12 '20 at 14:13

jorge gill

votes

1 answer

How to auto resize QVBoxLayout according to its child contents inside a QScrollArea?

Recently, I am trying to use PyQT5 to make a PDF viewer. I adapted the code provided in this post (Image Viewer GUI fails to properly map coordinates for mouse press event). I created a QScrollArea that contains a QVBoxLayout in order to dynamically…

python pdf pyqt pyqt5 pymupdf

asked Feb 25 '20 at 08:31

ps2pspgood

votes

2 answers

Extract text in a rectangle from pdf - Python

I have a requirement that to extract a text which in a rectangle from Pdf. There are several methods I have tested. But not getting specific text. For example I tested with PyMuPDF, pdfplumber, tabula, camelot, pdftables packages. In PyMuPDF module…

python text-extraction pdf-extraction pymupdf

asked Feb 13 '20 at 07:58

Kamaal Shaik

votes

2 answers

Extract images of pdf with pymupdf in right order

I am currently working on an Python 3.x image extractor for pdf-files and can't seem to find a solution for the problem I have been facing throughout my work. My intention is to extract all the images of pdf-files (vehicle reports) without the logos…

python-3.x pdf image-extraction pymupdf

asked Sep 02 '19 at 08:59

Jani

votes

0 answers

How can I determine whether a PDF page contains redacted material?

I have a set of PDFs, for which some pages have had partial contents redacted through Adobe Acrobat. I would like to programmatically iterate through each page and determine whether the page contains redacted content, preferably using Python (note…

python pdf acrobat pymupdf

asked Aug 08 '19 at 18:12

crkm

votes

2 answers

How do I access the text from a specific pdf page rather than the entire document

I am trying to extract some stuff from some pdf documents. I have been mucking around with various tools though I have invested the most in pdfminer and pymupdf. I started with pdfminer but started testing pymupdf after not being able to address…

python pdf pymupdf

asked Jun 19 '19 at 22:43

PyNEwbie

4,882
4
38
86

vote

0 answers

How to handle ligature issue while using pdf text

I need to capture some text from some PDFs. I use PymuPDF to do this. But facing ligature issue while writing those selected text inside a text file. I use the following code snippet to read the PDF pdf = fitz.open("file_path") full_text = "" for…

python text-processing python-unicode pymupdf

asked Aug 18 '23 at 15:12

WhyMeasureTheory

vote

1 answer

How to match placement,font,style and size of replaced text with search text in PDF files using Python?

I'm using Python and the PyMuPDF library to search for and replace text in PDF files. Its working properly but colored text replace in style does not get how to fix it? Here's the code I'm currently using: import os import fitz # Prompt user for…

python pdf pymupdf

asked Jun 27 '23 at 04:34

Hetul

vote

1 answer

PyMuPdf extract pdf information into a csv file, from multiple files. Why is this code only extracting data from the first page of each PDF?

I am trying to extract specific information from every PDF file in a folder into a single CSV file. Each PDF has the information across multiple pages. However something is wrong with my loop or how it is implemented and I am not quite sure why. The…

python loops csv pdf pymupdf

asked Jun 21 '23 at 16:11

J D

vote

0 answers

How to highlight a blob of text using PyMupdf

so, I have a pdf file. I am reading it via the PyMuPDF package. I read the text and break the text into chunks. So for the below text screenshot in one of the pages of the original pdf, I get the text read as below: The text I have in…

python pymupdf

asked Jun 20 '23 at 02:34

Baktaawar

7,086
24
81
149

vote

0 answers

How can I improve the PDF compression quality in my Python code using the PyMuPDF library?

Main Goal:My main goal of this side project is to make a script that can read all the files in a Google drive identify all the pdfs and compress the Pdf file to take less space,The below is how far i have got. I have a Python script that uses the…

python-3.x google-colaboratory pymupdf

asked Apr 13 '23 at 04:51

Pranay kumar

vote

1 answer

Recognizing drop caps in PDF in python

I'm currently using pymupdf to extract text blocks from a file in python. import fitz doc = fitz.open(filename) for page in doc: text = page.get_text("blocks") for item in text: print(item[4]) The problem is that drop caps are…

python extract pymupdf

asked Mar 11 '23 at 03:00

Esraa Abdelmaksoud

1,307
12
25

vote

1 answer

How can I edit/modify/replace text in an existing PDF file?

I am working on my final year project, so I working on a website where a user can come and read PDF. I am adding some features such as converting currency to their country currency. I am using flask and pymuPDF for my project and I don't know how I…

python flask pypdf pymupdf

asked Feb 08 '23 at 14:36

abhinav srivastava

vote

0 answers

How can I either ignore blank pages in a pdf using python or add blank pages to a location without changing the total amnt of pages until doc saved?

So I'm using the tkinter and pymupdf libraries to add blank pages to a desired location. This is done by pressing a button which inserts the blank page below the page on the button. My issue is that once it inserts the blank page, the original page…

python pdf tkinter range pymupdf

asked Jan 23 '23 at 19:24

mrawesome0238

Prev 1 2 3

…

17 18 Next