We are trying to convert a .docx – and later other potential file formats – into a kind of standard XML. This XML is going to be mapped through an XSLT to the XML of our choice (xsd).
For the conversion to be successful, we need to keep as many of the information elements within the document as possible. The most important ones are the structure, the content, tables, lists, and figures (images etc) within the document.
We have realised that getting a document that this job is complex, and that there are serious restrictions to what kind of documents we can support.
As there are different standards, implementing a converter for each of them would be time demanding.
Does anyone have some experience with Document Conversion to XML? Any tips on how to proceed?