I'm looking for tips on how to improve the database performance in the following situation.
As a sample application, I wrote a fairly simple app today that uses the Twitter streaming API to search for certain keywords, then I am storing the results in MongoDB. The app is written with Node.js.
I'm storing 2 collections. One stores the keyword and an array of tweet id's that reference each tweet found mentioning that keyword. These are being added to the database using .update() with {upsert:true} so that new id's are appended to the 'ids' array.
A sample document from this collection looks like this:
{ "_id": ObjectId("4e00645ef58a7ad3fc9fd9f9"), "ids": ["id1","id2","id3"], "keyword": "#chocolate" }
Update code:
keywords.update({keyword: key_word},{$push:{ids: id}},{upsert:true}, function(err){})
The 2nd collection looks like this and are added simply by using .save()
{
"twt_id": "id1",
"tweet": { //big chunk of json that doesn't need to be shown }
}
I've got this running on my Macbook right now and its been going for about 2 hours. I'm storing a lot of data, probably several hundred documents per minute. Right now the number of objects in Mongodb is 120k+.
What I'm noticing is that the cpu usage for the database process is hitting as high as 84% and has been constantly going up gradually since I started the latest test run.
I was reading up on setting indexes, but since I'm adding documents and not running queries against them, I'm not sure if indexes will help. A side thought that occurred to me is that update() might be doing a lookup since I'm using $push and that an index might help with that.
What should I be looking at to keep MongoDB from eating up ever increasing amounts of CPU?