2

What is the best tool for extracting text and inline tags (bold, italic, and so on) from a 2010 docx if the objective is to be able to transform the open XML into a less complex one?

An idea that comes to mind is to convert the docx to another format. If so, which format would you suggest and on which program (preferably open source)?

Any other ideas (that is, different approaches)? Many tools seem to still be done for MSOffice 2007. Is namely Xpath, XQuery and XSLT the way to go, and if so why?

Please be patient. I'm a beginner on this and I would also gladly welcome indications about preferably concise sources of knowledge.

xlixol

  • What is your objective? An obvious format would be XHTML. Why 2010? If you are really interested in the few 2010 specific features, this is probably at odds with your search for a less complex format! – JasonPlutext Jul 24 '12 at 22:24

0 Answers0