I am working on this project that requires extracting data from all sorts of document, and for this I need to convert the document to XML so that I can later parse it using SAX Parser. So, how do I get the XML equivalent of a ms office document?
Asked
Active
Viewed 50 times
0
-
2If the document is saved with one of the newer formats (eg. `.docx` for Word) then it is already Zipped XML. – Richard Aug 10 '15 at 09:12
-
Which file formats are you working with? Something like `.doc` is quite different to something like `.xlsx`. Also, what kind of XML? HTML? Other? – Gagravarr Aug 10 '15 at 11:39