NoSQL databases - good candidates for log processing/aggregation and rollup?

Question

I have a MS SQL database that's used to capture bandwidth stats. We have a raw data table and to improve reporting speed at different drill-down levels we aggregate and rollup data on an hourly, daily and weekly basis to separate tables.

Would a NoSQL database such as Mongo or Raven be a good candidate for this type of application?

score 17 · Accepted Answer · answered Jan 20 '11 at 11:55

17

Different NoSQL solutions solve different problems for different uses - so first off the best thing to do is look at your problem and break it down

You are writing heavily to storage, therefore write speed is important to you
You want to perform aggregation operations on that data and have the results of that easily queryable
Read speed isn't that important from the sound of things, at least not in an "web application has to be really responsive for millions of people" kind of way
I don't know if you need dynamic queries or not

Let's look at Couch, Mongo and Raven in a very high level, generalised way

Raven

Fast writes
Fast queries (eventually consistent, pre-computed, aggregation via map/reduce)
Dynamic queries possible, but not really appropriate to your use case, as you're most likely going to be querying by date etc

Mongo

Blindingly Fast writes (In my opinion dangerously, because power going off means losing data ;-))
Slow reads (relatively), aggregation via map/reduce, not pre-computed
Dynamic queries are just what_you_do, but you probably have to define indexes on your columns if you want any sort of performance on this sort of data

Couch

Fast writes
Fast-ish reads (Pre-computed, but updated only when you read (IIRC)
Dynamic queries not possible, all pre-defined via map or map/reduce functions

So, basically - do you need dynamic queries over this sort of data? Is the read speed incredibly important to you? If you need dynamic queries then you'll want Raven or Mongo (For this sort of thing Couch is probably not what you are looking for anyway).

FWIW, Mongo's only use case in my opinion IS for logging, so you might have an anwer there.

answered Jan 20 '11 at 11:55

metdos

13,411
17
77
120

nice answer asking what will requirements be before choosing :) – stack72 Jan 20 '11 at 11:57
@Rob: I understand that CouchDB queries aren't pre-computed, they're lazily computed on demand, and then cached. Aside from that, +1. – skaffman Jan 20 '11 at 11:57
Isn't that what I said? That's effectively pre-computed - additional queries will combine the results so far with any changed data "but updated only when you read" – metdos Jan 20 '11 at 11:58
I'll clarify, pre-defined is probably a better word anyway – metdos Jan 20 '11 at 11:59
Rob - thanks for the detailed answer. Ultimately there's a customer facing reporting app that draws graphs at different drilldown levels. There will never ever be a need to compute values in this web app which is why I aggregate this data to hrly, daily, weekly in a separate job. The rollup/aggregation process is the computationally expensive part but once a range of data is rolled up that's it. – Kev Jan 20 '11 at 12:56
Then you may as well use Raven and do that aggregation using map/reduce, you get to keep the original data but your results are pre-computed in the background – metdos Jan 20 '11 at 13:03
Although, if your existing solution works there is little benefit to switching – metdos Jan 20 '11 at 13:04
1

An option with MongoDB is to do real-time rollups using $inc and upsert - this lets you store much less data if you know how you want to rollup ahead of time. (I do 6-10 inserts per event into multiple rollup tables and it is lightning fast). Also - setup in a replica set you have minimal chance of data loss (and in 1.8 they are adding a log which will be full durability) – James Avery Jan 20 '11 at 18:43

NoSQL databases - good candidates for log processing/aggregation and rollup?

1 Answers1

Linked