4

I'm about to start a new project which is basically a reporting tool which should have a rather very large database.

The number of tables will not be large (<200), majority of data (80%) will be contained in 20 tables, all data are almost insert/read only (no updates).

The estimated amount of data in that one table is going to grow at 240,000 records per minute , and we should keep at least 1 to 3 year of them to be able to do various reports and reports will be seen online by administrator.

I don't have first hand experience with that large databases, so I'm asking the ones that have which DB is the best choice in this situation. I know that Oracle is the safe bet, but am more interested if anyone have experience other than database like hadoopdb or Google's big table. please guide me . thanks in advance

jenitshah
  • 131
  • 2
  • 10
  • 1
    Do you *really* need to keep all the data? Can you aggregate it somehow? Merge the raw data into buckets, perhaps? 4000 records per second is a little bit insane. How big are your records? – Greg Hewgill Apr 02 '12 at 06:18
  • To anyone who goes straight for the calculator, I'll save you a little effort: 378.432 billion records after 3 years >. – Corbin Apr 02 '12 at 06:19
  • @greg hewgill thanks for your response .yes as my project is reporting tool so if user want to see report more than year then i have to keep that data for greater than year up to 3 years. i aggregate data but at the end of all report i show data in the single row form not aggregated form. – jenitshah Apr 02 '12 at 06:27

2 Answers2

4

Oracle is going to get very expensive to scale up enough. MySQL will be hard to scale. It's not their fault; an RDBMS is overkill for this.

Let me start with a dumb question: what are you doing with this data? "various reports" could be a lot of things. If these reports can be generated in bulk, offline, then, why not keep your data in a flat file on a shared file system?

If it needs to be more online, then yes the popular wisdom from the past 2 years is to look at NoSQL databases like Mongo, Couch and Cassandra. They're simpler, faster creatures that scale easily and provide more random access to your data.

Doing analytics on NoSQL is all the rage this year. For example, I'd look at what Acunu is doing to embed analytics into their flavor of Cassandra: http://www.acunu.com/blogs/andy-twigg/acunu-analytics-preview/

Sean Owen
  • 66,182
  • 23
  • 141
  • 173
  • hello thank you for your response . i have to provide online reporting to the administrator so i need to be more online db. i usually aggregate the data and i came to know from primary finding that NOSQL very slow in aggregation with data. is it true? – jenitshah Apr 02 '12 at 10:21
  • NoSQL databases don't, by nature, have any aggregation primitives. This is exactly why you should look at stuff like Acunu since they're building in proper real-time incremental real-time analytics. I know the guys and what they're doing is pretty great for this use case. – Sean Owen Apr 02 '12 at 10:43
0

You can Also Use Apache Solr And MongoDB. Mongo DB and Apache Solr are alos used for Handling Big data in NOSQL its very fast to insert and retrieve data into database. So you can use Apache Solr Or MongoDb database.

Vishal Ranapariya
  • 1,026
  • 8
  • 8