List matches of page.search_for() with PyMuPDF

Question

I'm writing a script to highlight text from a list of quotes in a PDF. The quotes are in the list text_list. I use this code to highlight the text in the PDF:

import fitz
#Load Document
doc = fitz.open(filename)

#Iterate over pages
for page in doc:
# iterate through each text using for loop and annotate
    for i, text in enumerate(text_list):
        rl = page.search_for(text, quads = True)
        page.add_highlight_annot(rl)
# Print how many results were found
print(str(i) + " instances highlighted in pdf")

I now want to get a list of the quotes that were not found and highlighted and was wondering if there is any simple way to get a list of the matches page.search_for found (or of those quotes it didn't find).

score 2 · Accepted Answer · answered Nov 26 '22 at 11:22

2

The list of hit rectangles / quads rl will be empty if nothing was found. I suggest you check if rl == []: and depend adding highlights on this as well as adding the respective text to some no_hit list.

Probably better the other way round: Your text list better should be a Python set. If a text was ever found put it in another, found_set. At end of processing subtract (set difference) the found set from text_list set.

answered Nov 26 '22 at 11:22

Jorj McKie

2,062
1
13
17

1

Thank you, that helped me. Because the script iterates over the pages (and thus rl is empty quite often] I slightly modified your solution and appended text to the no_hit list if `rl != []`, which I then compared against text_list. – SamVimes Nov 26 '22 at 11:48

List matches of page.search_for() with PyMuPDF

1 Answers1