I needed a quick solution to make text surrounded by "smart tags" visible to docx's text property, and found that the solution could also be adapted to make some tracked changes visible.
It uses lxml.etree.strip_tags to remove surrounding "smartTag" and "ins" tags, and promote the contents; and lxml.etree.strip_elements to remove the whole "del" elements.
def para2text(p, quiet=False):
if not quiet:
unsafeText = p.text
lxml.etree.strip_tags(p._p, "{*}smartTag")
lxml.etree.strip_elements(p._p, "{*}del")
lxml.etree.strip_tags(p._p, "{*}ins")
safeText = p.text
if not quiet:
if safeText != unsafeText:
print()
print('para2text: unsafe:')
print(unsafeText)
print('para2text: safe:')
print(safeText)
print()
return safeText
docin = docx.Document(filePath)
for para in docin.paragraphs:
text = para2text(para)
Beware that this only works for a subset of "tracked changes", but it might be the basis of a more general solution.
If you want to see the xml for a docx file directly: rename it as .zip, extract the "document.xml", and view it by dropping into chrome or your favourite viewer.