Questions tagged [pymupdf]

PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.

PyMuPDF is a Python binding for mupdf – “a lightweight PDF and XPS viewer”.

mupdf can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats.

These are files with extensions .pdf, .xps, .oxps, .cbz, .fb2 or .epub (so you can develop e-book viewers in Python).

PyMuPDF provides access to many important functions of MuPDF from within a Python environment.

Note on the Name fitz:

The standard Python import statement for this library is import fitz. This has a historical reason.

257 questions

vote

1 answer

Developing a generalized logic of getting highlighted area from multiple pdfs into pandas dataframe using python

I have created a solution using python which extracts highlighted portions from the PDF using pymupdf and fitz. This is the code for the same. def _parse_highlight(annot: fitz.Annot, wordlist: List[Tuple[float, float, float, float, str, int, int,…

python pandas dataframe pymupdf

asked Dec 30 '21 at 11:52

technophile_3

vote

0 answers

Programmatically change printer setting for each page in pdf file

I'am using python 3.10 and win32api to send print job to printer, I could change somes settings (set tray) before printing and it works fine, the probleme is : I couldn't update setting for each page, I browse pdf using pymupdf but it seems there is…

python-3.x windows winapi network-printers pymupdf

asked Dec 16 '21 at 15:40

khelili miliana

3,730
2
15
28

vote

0 answers

How to remove text layer from pdf using python

I need to remove all text information from pdf file. So the file I wanna get should be like scan: only images wrapped as pdf, no texts that u can copy or select. Now I'm using ghostscript command: import os ... os.system(f"gs -o {output_path}…

python pdf text ghostscript pymupdf

asked Nov 08 '21 at 22:48

Demetry Pascal

vote

1 answer

Overlay 2 pdf files by each page using pymupdf

I need to combine (merge/overlay) 2 pdf files like second on first by each page. I've tried the code import fitz doc1 = fitz.open(background) doc2 = fitz.open(only_text_path) doc1.insertPDF(doc2) but it only concatenates doc1 + doc2, doesn't…

python pdf merge overlay pymupdf

asked Oct 30 '21 at 18:55

Demetry Pascal

vote

1 answer

How to highlight multiple keywords in a .pdf file using PyMuPDF library

I am able to highlight all the occurrences of a single word in .pdf file using this but unable to highlight multiple keywords in .pdf file. Here's my code import fitz import os keywords = ["remote","setup"] pdfFile = "\D:\Python_Scripts\Email…

python pdf highlight pymupdf

asked Jul 06 '21 at 10:33

Amir Khan

vote

0 answers

Extract GPA from Resume through Python Using PyMyPDF

We made a program for simple Resume that extract the whole Resume Info in string line by line. Now I want to extract the GPA from that string. I tried a lot but could not get any idea regard this. So if anyone could configure this will be very…

python pdf pymupdf

asked Jun 26 '21 at 09:25

Abrar Hussain

vote

1 answer

python - read pdf ignoring header and footer

I have a pdf file that I am reading using pymupdf using the below syntax. import fitz # this is pymupdf with fitz.open('file.pdf') as doc: text = "" for page in doc: text += page.getText() Is there a way to ignore the header and…

python pdf pymupdf

asked Jun 22 '21 at 11:29

Jayashree Sridhar

vote

1 answer

how to extract text from a selection of pages in a larger pdf using pymupdf?

I know there are many libraries to extract text from PDF. Specifically, I've been having some difficulty with pymupdf. From the documentation here: https://pymupdf.readthedocs.io/en/latest/app4.html#sequencetypes I was hoping to use select() to pick…

python pdf nlp pymupdf

asked Jun 01 '21 at 02:54

Katie Melosto

1,047
2
14
35

vote

1 answer

How to find table grid lines in PDF files?

To more accurately extract table-like data embedded within table cells, I would like to be able to identify table cell boundaries in PDFs like this: I have tried extracting such tables using Camelot, pdfplumber, and PyMuPDF, with varying degrees of…

python pdf-extraction python-camelot pymupdf pdfplumber

asked Mar 03 '21 at 19:26

Mark Turner

vote

2 answers

Is there any way that I can identify whether the PDF is edited/tampered and the exact location where the PDF is edited/tampered using Python?

I am working on identifying forgery/tampering in bank statements PDF documents. Info metadata and XMP metadata is not always present in the PDFs that I have so I am not able to create any generalized rule to identify tampered PDFs. I am using Python…

python pdf pdfminer pymupdf tampering

asked Feb 09 '21 at 03:43

Abhishek Tanksali

vote

2 answers

selecting the exact match using pymupdf-page.searchFor()

Below is a piece of my code, where I'm searching for a particular word & extracting their coordinates. As per the documentation page.searchFor(), page.searchFor(needle, hit_max=16, quads=False, flags=None). Searches for needle on a page. Upper/lower…

python pymupdf

asked Oct 26 '20 at 11:14

RevolverRakk

vote

1 answer

Using python PyMuPDF (fitz) to iterate through lines and check length of it and add a period if it meets the criteria

Trying to iterate through each line of the page from the PyMuPDF library to check the length of the sentence, if it is less than 10 words then I would like to add a full stop. Psuedo code would be: #loop through the lines of the PDF #check number of…

python pymupdf

asked Aug 27 '20 at 20:06

user11464178

vote

0 answers

How to save different versions of a single pdf, with different highlights, PyMuPDF, Python?

I have a pdf document and for simplicity, I want to make two (many) different edited versions of the same pdf. For example, in one of the pdf, I want all the "and" in the pdf to be highlighted, and in the second I want all "the" to be highlighted. I…

python pdf pymupdf

asked Aug 25 '20 at 17:19

yoyo yoyo

vote

1 answer

How to Highlight a specific line/text in a pdf using Python

I am new to python and have been working on a project to make a new pdf with highlighted text. I am using pymupdf to get the text and am storing the text, font size, and the index of the text. I found a way to highlight the text but it searches and…

python string pdf text pymupdf

asked Aug 25 '20 at 09:41

yoyo yoyo

vote

2 answers

Python PyMuPDF looping next pages

I'm using below code to open a PDF file and convert into an image file as output. Now, i'm trying to figure it out how can I loop the next page and convert it as same output file. Any help is much appreciated! # display image on the canvas def…

python pymupdf

asked Aug 19 '20 at 06:04

faizal_a

Prev 1 2 3

…

17 18 Next