-1

The word document I am trying to read contains hyperlinks, colored text etc. At few hyperlinks are giving the following error.

enter image description here

When I remove the hyperlinks using manually "Remove Hyperlinks" option by opening the word file and saving it back, it works fine. enter image description here

I need to disable the hyperlink and keep the text as it is via Python and save back the document for further processing.

I tried multiple things like detecting links via docx.Document, but it fails to read the links. I was able to iterate the document element wise.

from docx import Document

# Load the Word document
file_path = "../.docx"
doc = Document(file_path)


# Iterate through paragraphs, tables, and hyperlinks
for element in doc.element.body:
    # Handle paragraphs
    if element.tag.endswith('p'):
        for run in element.findall('.//w:r', namespaces=element.nsmap):
            text_element = run.find('.//w:t', namespaces=run.nsmap)
            if text_element is not None and text_element.text is not None:
                text = text_element.text
                # Process 'text' here               
                
                # Print processed text
                print("Processed paragraph text:", text)

                if "sample-hyperlink" in text:
                    print("length", len(text))
                    text = text.strip()

                # Update run text
                text_element.text = text

Where I find the hyperlink text, I can replace with same text but it keeps the hyperlink enabled.

Is there anyway I can disable/remove the hyperlinks from all text in the the word document.

Rohit Kumar Singh
  • 647
  • 1
  • 7
  • 17

0 Answers0