Questions tagged [pymupdf]

PyMuPDF is a Python binding for MuPDF – “a lightweight PDF and XPS viewer”. MuPDF can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats. NOTE: It is imported in Python as fitz.

PyMuPDF is a Python binding for – “a lightweight PDF and XPS viewer”.

can access files in PDF, XPS, OpenXPS, CBZ (comic book archive), FB2 and EPUB (e-book) formats.

These are files with extensions .pdf, .xps, .oxps, .cbz, .fb2 or .epub (so you can develop e-book viewers in Python).

PyMuPDF provides access to many important functions of MuPDF from within a Python environment.

Note on the Name fitz:

The standard Python import statement for this library is import fitz. This has a historical reason.

257 questions
0
votes
0 answers

Python PDF parsing script fails :- mupdf: malloc of 51301 bytes failed

I'm attempting to parse data from around 53k pdfs stored on disk. The script I have iterates through a dataframe of filenames of pdfs and has a function which returns bounding boxes for each pdf and for each bbox parses the text data within that…
furbaw
  • 109
  • 1
  • 12
0
votes
2 answers

Can't install PymuPDF although python Libary have PymuPDF

I tried to install PyMuPDF on Python 3.9 when first I installed by pip install PymuPDF and re-checked by pip list like this" But when I imported PyMuPDF: ModuleNotFoundError: No module named 'PyMuPDF' Next, I tried to install PymuPDF from doc, it…
0
votes
0 answers

Add library/module to server

I am pretty new to python and would like to use the PyMuPDF library on a web server in order to modify PDFs. The problem is, I am unable to add/install any modules or libraries to/on the server. Is there a way to install all libraries and modules in…
jonsken
  • 111
  • 1
  • 11
0
votes
0 answers

Number of Entries in Xref Table

Is there any java library by which I can get a number of entries in the Xref Table of PDF? Document.xref_length() pyMuPdf has this, but I want it in java.
maester
  • 1
  • 2
0
votes
0 answers

Zoom and crop a pdf document using PyMuPDF fitz and saving as pdf

I am trying to crop a pdf within and lambda and save the file. Ideally I just want to zoom in as otherwise the OCR package does not recognize some of the fonts. The rectangle I am using just seems to shift the margins versus actually cropping or…
megv
  • 1,421
  • 5
  • 24
  • 36
0
votes
1 answer

AttributeError: 'Document' object has no attribute 'searchFor

I want to write a simple program that asks the user to open a PDF file from any location, add image A to any page that contains the keywords "Orange County", and add image B to any page that contains the keywords "Hillsborough county", then save the…
Zac
  • 13
  • 1
  • 5
0
votes
2 answers

I'm trying to read pdf one by one and then converting it into dataframe

I've used 'fitz' from Pymupdf module to extract data and then with pandas converting the extracted data to dataframe. #Code to read multiple pdfs from the folder: from pathlib import Path # returns all file paths that has .pdf as extension in the…
User1011
  • 143
  • 1
  • 10
0
votes
0 answers

PyMuPDF (fitz) not properly closing files, resulting in PermissionError [WinError 32]

I can't figure out why I'm getting a PermissionError when trying to clean up some temporary pdf files that are no longer needed. My script downloads a bunch of single page pdf's into a /temp folder, then uses PyMuPDF to merge them into a single pdf.…
0
votes
1 answer

Why does pymupdf have an origin that is not in the top left corner?

I don't seem to be able to figure out why pymupdf tools for placing objects on pdf documents has the origin set at a seemingly random location. Notice that (0,0,100,100), which is x0 y1 x2 y2 (where y starts from top) starts from the middle of the…
negfrequency
  • 1,801
  • 3
  • 18
  • 30
0
votes
1 answer

How to add background image in pdf using Pymupdf module in python

I am trying to add the background image in pdf using Pymupdf but it is creating a layer between pdf and image as you can see the output. How can I bypass(remove) the layer between pdf and backround image? please help me regrading this. This is how I…
Prabhat
  • 3
  • 3
0
votes
0 answers

Extracting html structure from PDF

I have a test pdf file with just a 3x3 table that are marked properly with table headings and the sort. What I want to do is extract the format of the table. Like so: left center right One Two Three If that table was in the pdf, I want…
Mat
  • 67
  • 1
  • 3
  • 17
0
votes
0 answers

draw_rect method of Pymupdf is not working on certain pages of pdf

I'm using draw_rect method of Pymupdf. It's not working on certain pages of the pdf. Following is the code for drawing rectangles. I tried the same rect values to plot on other pages and it plotted correctly. doc = fitz.open(filepath ) x0,y0,x2,y2 =…
Nayana Madhu
  • 1,185
  • 5
  • 17
  • 34
0
votes
1 answer

Python: mupdf: invalid key in dict

I am writing below code to remove annotations from a pdf file and then save it to new pdf. However, I am getting RuntimeError: invalid key in dict. Below is the Code: import fitz import re doc = fitz.open("test.pdf") for i in range(doc.pageCount): …
Sundaram
  • 1
  • 4
0
votes
1 answer

How can I transfer annotations between PDFs (e.g. using pymupdf)

I have been looking through the pymupdf documentation, and while there is a lot there and I can see how to identify annotations (Annot class), I can't work out how to put an annotation that I have found in one document from that one into another.…
Diomedea
  • 193
  • 1
  • 9
0
votes
0 answers

How to attach images using pymupdf

I have a pdf where 2 pages have total of 6 attachment boxes where you can click on them and after clicking you can choose the image file and it will be inserted in the pdf, so I want to do this using python I have tried pymupdf and after checking it…
Mr Anonymous
  • 75
  • 10