8

Can RethinkDB handle large data sets (i.e. multiple tera bytes ) effectively to serve as DB for an analytics application ?

JE42
  • 4,881
  • 6
  • 41
  • 51

1 Answers1

13

Disclaimer: I'm one of the founders of RethinkDB. Sorry for a longish answer -- the question is surprisingly nuanced.

RethinkDB is designed with a very flexible architecture. The architecture can scale from small instances to large clusters with large amounts of data (definitely TB+), and efficiently run a wide variety of queries (OLTP, OLAP, etc.)

However, in practice we're currently focusing on the real-time aspects of the system -- most of the optimizations we're currently doing are around the needs of real-time applications being built on top of RethinkDB. These are typically OLTP-ish workloads. We will absolutely get to optimizing OLAP-style workloads, but it isn't currently a top priority.

The best way to find out whether Rethink will work for you is to take it for a spin, and do some load-testing. You should be able to find out pretty quickly how well things work. (If you do and happen to run into issues, please let us know about them -- we'll be happy to help you out and fix any potential problems).

coffeemug
  • 1,338
  • 11
  • 10
  • 2
    Thank you for the helpful answer! What's the largest RethinkDB in production that you are aware of ? Is that in the one digit TB range or would you say it's above that ? – JE42 Nov 21 '13 at 07:49
  • 1
    I am facing the same issue. I have a table that has a few billion entries so I am assuming it's atleast a few GBs for sure. The startup is a little slow in my case. You need to be very careful about secondary index. What load testing tool do you suggest for rethinkdb? – Aman Gautam Dec 02 '13 at 13:35
  • 3
    Hi, now that Rethink is more mature, could you maybe consider posting a new answer? – DevLounge Jan 15 '16 at 20:35
  • 2
    Hi, very interested in hearing more about RethinkDB analytic capabilities regarding large data sets (1-10TB) now that RethinkDB is more mature. – Aviran Cohen Jun 02 '16 at 15:10
  • @coffeemug Thank you very much for your kind answer but I really would like to know how to process a large amount of data with the simple query and what are the things I should taking care and what the things I should consider while creating the index and how could I decide index field? I have asked you a lot question with the single sentence and I have already ready your RethinkDB doc but it was somehow unclear and not to understand and solve my issue. – Dipak Jan 30 '18 at 13:46