2

I'm using PyMuPDF for annotating some text in . pdf document by using:

import fitz
import re


def data_(text):
    annotation_text = r"(amet)"
    for line in text:
        if re.search(annotation_text, line, re.IGNORECASE):
            search = re.search(annotation_text, line, re.IGNORECASE)
            yield search.group(1)


def includeannotation(path_included):
    document = fitz.open(path_included)

    for page in document:
        page.wrap_contents()
        obs = data_(page.getText("text").split("\n"))
        # print (obs)
        for data in obs:
            catchs = page.searchFor(data)
            [
                page.addRedactAnnot(catchs, fontsize=11, fill=(0, 0, 0))
                for catch in catchs
            ]
        page.apply_redactions()
    doc.save("annotation.pdf")
    print("end - created")


path_included = "/content/document.pdf"

save_document = includeannotation(path_included)

The source .pdf document contains the text: enter image description here By applying the above mentioned code, I can include the annotation for the text "amet" obtain the following result:

enter image description here

And the result seems to be in line with the expection, but you can see that the library has included the annotation in black (for "amet") also deleting the word in the line after, but not with the black annotation. And in fact it looks like a restyling problem.

How can I avoid such problem?

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
user3043636
  • 559
  • 6
  • 23

0 Answers0