These two FilterName values produce different flat XML formats:
OpenDocument Text Flat XML
MS Word 2003 XML
I found these names by doing this:
- Enabled macro recording by going to Tools -> Options -> Advanced, check "Enable Macro Recording".
- Tools -> Macros -> Record Macro.
- File -> Save As. Selected various options for the type.
- Named the macro, then checked the FilterName property in the resulting Basic code.
Keep in mind that .odt and .docx are also XML-based formats, only they are zipped up rather than flat. It is possible to parse files in these formats by doing something like this:
import os
import xml.dom.minidom
import xml.parsers.expat
import zipfile
filepath = "in.odt" # or "in.docx"
tempDir = "path/to/temp/dir/" # change according to your system
with zipfile.ZipFile(filepath, 'r') as zipper:
zipper.extractall(tempDir)
try:
dom = xml.dom.minidom.parse(os.path.join(tempDir, "content.xml"))
except xml.parsers.expat.ExpatError:
# handle exception