0

I've got an application that runs on NodeJS, and uses MongoDB as a database for information.

Currently I hook into MongoDB via the MongoJS module, which aims to "emulate the official mongodb API as much as possible".

The application gets about 20,000 objects and saves each one to MongoDB. First, it looks up the database to see whether the object already exists, then it either updates the existing entry or adds a new entry.

This can be quite slow. I'm not sure if it's because MongoJS is synchronous/single-stream (if it even is - I'm not sure!), or it's just a reality of writing a lot of entries to a DB, but it takes 45min-1hr to do all of this and I'd obviously like to cut this down as much as possible.

I'm wondering whether Mongoose would be a better/faster option for this. Apparently it's asynchronous, I don't know whether that would have an affect at all on performance.

JVG
  • 20,198
  • 47
  • 132
  • 210

2 Answers2

4

First, it looks up the database to see whether the object already exists, then it either updates the existing entry or adds a new entry.

You can let mongodb do this for you with a single update command and the 'upsert' (update/insert) option set:

http://docs.mongodb.org/manual/reference/method/db.collection.update/#db.collection.update

Try that first and see if things speed up for you.

martingreber
  • 636
  • 3
  • 5
  • Oh, very interesting! I do `update` already if the result exists, I didn't realise it could be done at once. – JVG Jul 17 '13 at 02:54
  • Yeah, it's pretty handy :) Now if that doesn't improve the performance much, you should take a look at the query you're passing to mongo. Make sure you're searching on an indexed field. – martingreber Jul 17 '13 at 03:04
  • The criteria I'm checking against is: `{"name": item.name,designer": item.designer,"imgs": item.imgs[0],"store": item.store}`. None of these use wildcards, so I assume they're all indexed? – JVG Jul 17 '13 at 03:20
  • It's still only going at about 1 to 5 entries a second (about 1 hour to finish 20,000 upserts). – JVG Jul 17 '13 at 04:00
3

My guess is the bottle neck is your application getting the objects. If you were to synthesize fake objects in RAM and slam then into mongo, you would be getting probably more like several hundred objects into the DB per second (rough guess, but at least 2 orders of magnitude more than the 1/s you claim). In either case FYI both MongoJS and Mongoose are asynchronous, as is pretty much every node database API. In fact, AFAIK there is simply no synchronous networking API in node, so while it's possible to wrestle node into synchronous filesystem I/O, all networking in node might be inherently asynchronous.

In either case, mongoose is going to add a tiny amount of overhead so it's likely to be technically slower than mongojs, but not by any meaningful amount (like it takes 5.1 seconds to insert 20K records vs 5.0 seconds).

Peter Lyons
  • 142,938
  • 30
  • 279
  • 274
  • Great response. I'm marking this as the correct answer regardless, but can you elaborate on the bottleneck and what steps I might take to fix it? The mongo function is called in a `for` loop that goes through all my products/objects, I've got `process.setMaxListeners(0);` but don't know what else to try/where I'm bottlenecking. – JVG Jul 17 '13 at 04:16
  • Where is the data coming from? That's where I suspect the bottleneck is. But you also might want to consider how many concurrent insert operations your mongod can handle, and maybe throttle your inserts accordingly. Hard to speculate further without more concrete data about your situation. – Peter Lyons Jul 17 '13 at 04:33
  • I see. The data's coming from a web scraper, so that is most likely the bottleneck! I was just wanting to see whether the actual DB writing could be improved upon, but it looks like Mongo is as fast as I can make it. Cheers for your help. – JVG Jul 17 '13 at 04:37