Problem Statement
- Reading pdf and search for a word.
- If word found, annotate the word and get an area cropped around the annotated text from the pdf file.
- Each cropped image should only have one annotation.
Libraries and Versions
- python-3.6
- fitz-0.0.1.dev2
- pymupdf-1.17.5
Issue facing
For the first two iterations the annotation is perfect and cropping also works perfectly as expected. But by iterating for next occurence of search word from text instances then crop around that area as well as annotation of search word gets failed. Can't Find a solution for this problem.
def cropPdf( pdfName,word):
c=0
# opening the pdf file using fitz
fitz_doc=fitz.open(pdfName)
# getting first page of the doc
fitz_page=fitz_doc[0]
# finding all instances where the searchword is found
text_instances=fitz_page.searchFor(word)
# Iterating through each text instances
for text_cord in text_instances:
c=c+1
pdfPath = "./" + pdfName + ".pdf"
# To add highlight(Rectangle Annotation) around the search word
highlight = fitz_page.addRectAnnot(text_cord)
# getting the bounding box cordinate
x0,y0,x1,y1=highlight.rect
# here i set the cropping area around the annotated text
fitz_page.setCropBox(fitz.Rect(x0+600,y0+600,x0-600,y0-600))
#
pix=fitz_page.getPixmap()
print(fitz_page.number)
base_name_highlight="output"+str(c)+".png"
# saving the cropped area as png file
pix.writeImage("./highlight_folder/"+base_name_highlight)
# Deleting the marked annotation which helps me to avoid duplicate annotation inside a cropped area,
# when starting to annotate the next occurence of the word to annotate while iterating.
fitz_page.deleteAnnot(highlight)
cropPdf(pdfName="A4_4.pdf",word="INSULATION")
Result Images