Questions tagged [pymupdf]

PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.

PyMuPDF is a Python binding for – “a lightweight PDF and XPS viewer”.

can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats.

These are files with extensions .pdf, .xps, .oxps, .cbz, .fb2 or .epub (so you can develop e-book viewers in Python).

PyMuPDF provides access to many important functions of MuPDF from within a Python environment.

Note on the Name fitz:

The standard Python import statement for this library is import fitz. This has a historical reason.

257 questions
0
votes
3 answers

How do I merge items from a list avoiding repeated content inside the items?

Edit 4: Simpler example of what I want to do: I have a list like this: sentences = ['Hello, how are','how are you','you doing?'] And I want to turn it into a string like this: sentence = 'Hello, how are you doing?' Any help is appreciated! Original…
0
votes
1 answer

PyMuPDF ModuleNotFoundError

I successfully ran the command: pip install pymupdf Successfully installed pymupdf-1.18.15 However, both import fitz and import pymupdf both output an ModuleNotFoundError. Why is python giving me a ModuleNotFoundError?
0
votes
0 answers

How to use lxml to parse xml extract of pymupdf?

So I read each page of a pdf and appended every xml extract to a string variable. Using Page.get_text(“xml”). The text output consisted of many units of \n
0
votes
1 answer

PyMuPDF - Scale a Quad from center in all directions

I'm searching for text in a pdf and extracting a quad and adding a polygon_annot around it. But I would like to scale the polygon_annot. How can I do that? Below is my code: for inst in text_instances: inst = inst.transform(fitz.Matrix(2, 2)) …
Gangula
  • 5,193
  • 4
  • 30
  • 59
0
votes
2 answers

Why does PyMupdf Document show the error, no attribute 'new_page', when it is a PDF?

I'm working on annotating a PDF and I want to change its color. I was guided to this helpful link: https://pymupdf.readthedocs.io/en/latest/faq.html#how-to-add-and-modify-annotations I used the code in the link: # -*- coding: utf-8…
Katie Melosto
  • 1,047
  • 2
  • 14
  • 35
0
votes
1 answer

Comparing keywords with PDF files

Here is the program that called the files through folder name and extract data. Now i want to compare the data with the keywords that I used in the program below. But it gives me: pdfReader = pdfFileObj.loadPage(0) AttributeError:…
0
votes
1 answer

How to read pdf files with pymupdf in PyQt5?

I want to open pdf file through pilihfile pushbutton, then take its name to display on textEdit and display its pdf contents on textEdit_2 by using pymupdf. But i got error said cannot open ('D:/Kuliah/KRIP.pdf', 'PDF Files (*.pdf)'): Invalid…
Henry
  • 5
  • 3
0
votes
1 answer

Image replacement using PyMuPDF

I'm using PyMuPDF to replace images. But when I have a dictionary of images mapped to their bbox coordinates only the image in the first page gets replaced. How can I get all the images in the dictionary to be replaced? Here's my code: 'bbval' is…
vbadwaj
  • 19
  • 2
0
votes
0 answers

pyqt multithreading: why the worker thread blocks the main thread

when I try to load some .pdf which size>10MB or pages>300 , the worker thread will block the main thread , I don't know how to use QThread correctly, I want by each time the pixmap_page_load run , the signal is emitted to the main thread. here is…
nevermind_15
  • 217
  • 1
  • 6
0
votes
1 answer

How can I avoid extracting small image elements from PDF file in python?

I am trying to extract all the images from this PDF file:…
Suraj Kadam
  • 71
  • 1
  • 3
0
votes
0 answers

Extra svg and text from PDF in python

I need to get text and svgs incorporated in pdf in python. I tried PyDF2, PyPDF4, tika did not work. I tried using pymupdf but getting below error. Can some help me with it. I am using python 3.8, pycharm. All modules required for pymupdf are…
Rohit T
  • 1
  • 1
0
votes
0 answers

My python exe file is not working in a share disk but works in jupyter notebook

I write a python script to read the pdfs files in the current folder(inside shared disk) looking for specific number and then search in other folder (same shared disk) that number. If match, with PyMuPDF I merge both files in a new file. After that,…
0
votes
1 answer

Sclicing with pymupdf

I'd like to mark several keywords in a pdf document using Python and pymupdf. The code looks as follows (source: original code): import fitz doc = fitz.open("test.pdf") page = doc[0] text = "result" text_instances = page.searchFor(text) for…
danik
  • 103
  • 2
0
votes
1 answer

How do I delete line break in PDF text extraction in Python?

I used PyMuPDF to get the text in the PDF, here is my code import fitz pdf_document = "KRIP.pdf" doc = fitz.open(pdf_document) page1 = doc.loadPage(0) page1text = page1.get_text() print("Text from PDF: ", page1text) the output should…
brianK
  • 25
  • 7
0
votes
0 answers

Replace text in a pdf file in Python using Fitz

Does anyone have tried before to replace text from a PDF File using Fitz of PyMuPDF Library ? i have tried to use the code below and i am not sure if i am close to the result or it's impossible to use using this library: import fitz file_name =…