I'm working on a web application where users can upload Microsoft Office Document files. Right now, our server is running Node.JS with Express.js and we're hosted on Heroku. Because of this, I don't think that I can install programs such as abiword or catdoc. I can handle the file uploads, but can't parse the contents of the document.
How can I read the contents of the doc file? The information will then be put into a database. It'd be nice to preserve basic formatting (bold, italic, underline), but not essential.