21

I'm working on a web application where users can upload Microsoft Office Document files. Right now, our server is running Node.JS with Express.js and we're hosted on Heroku. Because of this, I don't think that I can install programs such as abiword or catdoc. I can handle the file uploads, but can't parse the contents of the document.

How can I read the contents of the doc file? The information will then be put into a database. It'd be nice to preserve basic formatting (bold, italic, underline), but not essential.

arknave
  • 603
  • 1
  • 6
  • 16

4 Answers4

7

While there don't seem to be anything you can get with NPM that will do Word directly, you might be able to use a REST API to request it via another cloud service. For example Saaspose (they of the famous Aspose tools) have public API for Word, Excel, PDF, and others. They list node.js, javascript, and Heroku support on their page.

EDIT:

I see that Saaspose is now called Aspose for Cloud

Another API that claims something similar is Doxument

explunit
  • 18,967
  • 6
  • 69
  • 94
5

Office package: npm install office seems to provide at least part of the answer. I use it to read Excel files, so far have not tried any Word docs.

Deer Hunter
  • 1,211
  • 1
  • 18
  • 31
  • 2
    Note: this package seems to convert the input to HTML by running unoconv http://dag.wieers.com/home-made/unoconv/ (the OpenOffice converter). (For spreadsheets it seems to convert to xls with unoconv, then convert xls->html using http://freecode.com/projects/xlhtml ) – Nickolay Jan 12 '13 at 08:25
  • 5
    and unoconv requires [libreoffice](http://www.macupdate.com/app/mac/35446/libreoffice).. how deep does the rabbit hole go? I don't think this solution would scale very well, unless if you wanna install all these applications on all your servers.. which is quite a task frankly – abbood Apr 03 '13 at 05:22
3

You can use mammoth to parse .docx files https://www.npmjs.com/package/mammoth and xlsx to parse .xlsx files https://github.com/SheetJS/js-xlsx

ZhenyaUsenko
  • 393
  • 4
  • 8
2

There doesn't seem to be any yet. See below for something that might help.

Can I read PDF or Word Docs with Node.js?

Community
  • 1
  • 1
LiamB
  • 18,243
  • 19
  • 75
  • 116