Covert Rect location from pymupdf to a page number
If I get the locations of certain text like "exam" and get the rectangle location. I then highlight the text in the pdfs with that location. I now want to delete all other pages that do not have this text in it so I use the doc.select()
function to select the pages I want to keep before making a save of the new pdf with the pages with highlighted text on only.
The Issue
You have to pass a dictionary to the doc.select()
function with the page numbers I want to keep.
So what I tried to do was to pass the dictionary with the rectangle coordinates to this function but I got the following error
<br>
ValueError: bad page number(s)
<br> I know understand that I must be able to convert the coordinates of the rectangles to page numbers. But I don not know how to do this and it is not mentioned anywhere in the docs (Correct me if I am wrong) . <br>
Current code
from pathlib import Path
import fitz
directory = "pdfs"
# iterate over files in
# that directory
files = Path(directory).glob('*')
for file in files:
doc = fitz.open(file)
for page in doc:
### SEARCH
text = "Exam"
text_instances = page.search_for(text)
### HIGHLIGHT
for inst in text_instances:
highlight = page.add_highlight_annot(inst)
highlight.update()
### OUTPUT
doc.select(text_instances)
doc.save("output.pdf", garbage=4, deflate=True, clean=True)
Pdf that I used for testing purposes:
pdf