I have written script to extract some information from pdf file.
My code:
for page in doc:
rect = fitz.Rect(22, 52, 562,802) # crop page margins to ignore header, footer, left side
blocks = page.get_text("blocks",rect, flags=fitz.TEXTFLAGS_TEXT)
for i in blocks:
if (i[-3][0].isdigit()):#check if title
if (i[-3].partition(" ")[0].count('.')==0):#check if subtitle
nr=i[-3].partition(" ")[0]
txt = (i[-3]).partition(" ")[2]
else:
sub_nr='="' +i[-3].partition(" ")[0]+ '"'
sub_txt=i[-3].partition(" ")[2]
elif (i[-3].startswith("[V2G")):
id=i[-3].partition("\n")[0].replace("[", " ").replace("]"," ")
text=i[-3].partition("\n")[2].strip()
data.append(req(filename, nr, txt, sub_nr, sub_txt, id, text))
I would like to add another condition to the txt variable depends on the font name.
if font1 == 'Cambria-Bold':
txt=.....
how can I get the font name?
I have found this method in the pymupdf library page.get_fonts()
but it shows the hole fonts in the page and not for specific text.
how can I use this method for my purpose
Is there another library in python to get font info?
Thank you for helping