1

I am trying to write a python script that would automate the process of finding text in a pdf and highlight according

I am using pymupdf module of python. It works for some pdf. However, when for the target pdf(drawing of components and property tables) it would save output as a blank pdf with no data and some blank highlights.

import fitz

doc=fitz.open("c5.pdf")

page = doc[0]

text = "a"

text_instances = page.searchFor(text)

for inst in text_instances:
    highlight = page.addHighlightAnnot(inst)


doc.save("out.pdf", garbage=4, deflate=True, clean=True)
user12140050
  • 109
  • 1
  • 1
  • 7

1 Answers1

0

Your PDF probably contains elements which appear like text but are something else. It may be that they are just some type of graphics or image. In that case the text search of course cannot find anything.

Please submit an issue on my repo for PyMuPDF with some sample PDF to allow me investigating this.

Jorj McKie
  • 2,062
  • 1
  • 13
  • 17