I am trying to extract hyperlink present in each page with their anchor text from pdf using PymuPdf library. I am able to extract hyperlinks with their page numbers but couldn't able to extract anchor text/words for every hyperlinks.
Can anyone help me ?
Here is the code
import fitz # PyMuPDF
result = []
with fitz.open(file) as doc:
for page_no in range(1, len(doc)+1):
page = doc[page_no-1]
for link in page.links():
if "uri" in link:
url = link["uri"]
result.append([page_no, url])
else:
pass
Thanks!