The word document I am trying to read contains hyperlinks, colored text etc. At few hyperlinks are giving the following error.
When I remove the hyperlinks using manually "Remove Hyperlinks" option by opening the word file and saving it back, it works fine.
I need to disable the hyperlink and keep the text as it is via Python and save back the document for further processing.
I tried multiple things like detecting links via docx.Document, but it fails to read the links. I was able to iterate the document element wise.
from docx import Document
# Load the Word document
file_path = "../.docx"
doc = Document(file_path)
# Iterate through paragraphs, tables, and hyperlinks
for element in doc.element.body:
# Handle paragraphs
if element.tag.endswith('p'):
for run in element.findall('.//w:r', namespaces=element.nsmap):
text_element = run.find('.//w:t', namespaces=run.nsmap)
if text_element is not None and text_element.text is not None:
text = text_element.text
# Process 'text' here
# Print processed text
print("Processed paragraph text:", text)
if "sample-hyperlink" in text:
print("length", len(text))
text = text.strip()
# Update run text
text_element.text = text
Where I find the hyperlink text, I can replace with same text but it keeps the hyperlink enabled.
Is there anyway I can disable/remove the hyperlinks from all text in the the word document.