13

I'm relatively new to NoSQL databases and I have to evaluate different NoSQL-Solutions for a monitoring tool.

The situation is the following: One datum is just about 100 Bytes big, but there are really a lot of them. During a day we get about 15 million records... So I'm currently testing with 900 million records (about 15GB as SQL-Insert Script)

My question is: Does Couchdb fit my needs? I need to do range querys (on the date the records were created) and sum up some of the columns acoording to groups definied by "secondary indexes" stored in the datum.) I know that MapReduce is probably the best solution to calculate that, but is the JavaScript of CouchDB able to do this in an acceptable time?

I already tried MongoDB but it's really poor MapReduce made a crappy job... I also read about HBase and Cassandra. But maybee CouchDB is also a good possibility

I hope I gave you all the needed information... Thank you for your help!

andy

JasonSmith
  • 72,674
  • 22
  • 123
  • 149
andy
  • 1,852
  • 2
  • 20
  • 31
  • 1
    First, the only way to know performance is to measure it as there are too many variables to guess. Second, be not too attracted to stuctured storage when a half-century of RDBM experience is waiting to handle your 100 octet data. I'm guessing at 100B/row, your data isn't very variant (where SS excels). – msw Jul 04 '11 at 11:47
  • Good points, @msw. Of course, the way to *definitively* know performance is measurement; however I suppose it is valid to ask for first-approximation, ballpark estimates. I modified the question title to be a bit more black-and-white. (Not sure if you voted to close or that was someone else, but IMHO it's a fair question.) Finally, totally right about RDBM. They are more valuable than we give credit. – JasonSmith Jul 05 '11 at 04:24
  • The data I'm evaluating is currently handled by a really strong SQL-Server. But it can't handle the requests send by the user to gain information out of the mass of data. It simply needs to much time. That's why we search for NoSQL-Solutions with the capability of scaling horizontally. – andy Jul 13 '11 at 12:26

1 Answers1

12

Frankly, at this time, unless you have very good hardware, Apache CouchDB may run into problems. Map/reduce will probably be fine. CouchDB's incremental map/reduce is ideal for your requirements.

As a developer, you will love it! Unfortunately as a sysadmin, you may notice more disk usage and i/o than expected.

I suggest to try it. Being HTTP and Javascript, it's easy to do a feasibility test. Just remember, the initial view build will take a long time (let's assume for argument it takes longer than every other competing database). But that time will never be spent again. Map/reduce runs only once per document (actually per document update).

DharmaTurtle
  • 6,858
  • 6
  • 38
  • 52
JasonSmith
  • 72,674
  • 22
  • 123
  • 149
  • 3
    +1 However it is fair to note that "never" here means "until, some change to the design document provokes a rebuild of the view." Just to get you prepared for this... :) – Marcello Nuccio Jul 05 '11 at 08:44
  • 6
    For production use, there is a solution to that. If you ask how, I will be glad to give details. Short version: Send new design doc with a different id. Query it to build the index. When complete, use HTTP COPY to rename the new one over the old one. Atomic upgrade, no downtime. – JasonSmith Jul 05 '11 at 09:26
  • +1 'tis a fair question and a fair answer (and I try to be gentle to the newer members, so no close vote from me without explanation (since you asked obliquely)). – msw Jul 05 '11 at 10:23
  • @jhs I was about to ask you for more details, when Sean Copenhaver sent to CouchDB users list a link to a [wiki page with exactly this information](http://wiki.apache.org/couchdb/How_to_deploy_view_changes_in_a_live_environment)... funny :-) – Marcello Nuccio Jul 06 '11 at 07:39