0

In this presentation there was a chart that showed the following horizontal scalability ceiling as data gets larger:

key-value > column family > document database > graph database

http://youtu.be/UodTzseLh04?t=13m36s

In other words, as data gets more connected (i.e. complex) the limit on which you can let the database grow gets lower.

Why is data size not as scalable for document databases compared to key-value stores? Have I answered my own question by saying "the more freedom in connecting data, the harder it is to partition data"?

(The "what I'm trying to do" part which everyone usually asks: I have a database with a schema that is MOSTLY tree-like but occasionally has nodes with 2 parents. I used Neo4j in my prototype but for a production-scale app I'd need to think more about partitioning. I'm going to have to use Mongo DB since Graph Databases cannot easily be partitioned, and it will be harder to write code for my "multiple parents" relationships in Mongo DB. So I'm wondering if it's worth going the extra mile and use key-value stores - or at least a column family store).

Sridhar Sarnobat
  • 25,183
  • 12
  • 93
  • 106

1 Answers1

2

For graph databases ... I would consider looking at Titan for scalability. https://github.com/thinkaurelius/titan.

They wrote a good blog post recently about how their database engine stores data for scaling/performance: http://thinkaurelius.com/2013/11/01/a-letter-regarding-native-graph-databases/

Titan also can be configured to work hand in hand with Cassandra, so you get the benefit of a columnar database as well.

I think you hit the nail on the head with your understanding of relationships (one entity relating to another) and scalability.

The more "joins" or "connections" you have to manage, the harder it is to scale.

Key/value systems assume you will relate data in your application. There are no concepts of queries, so to scale, you can shard based on the key. Pretty easy and very scalable.

If you read some of the articles about Titan it's easy to see why it's hard to scale something like a graph database.

ryan1234
  • 7,237
  • 6
  • 25
  • 36
  • Thanks Ryan. If you have any recommended articles I'd like to read them. I thought Titan would do a slightly better at scaling. – Sridhar Sarnobat Nov 07 '13 at 23:19
  • I would read the Aurelius blog http://thinkaurelius.com/blog/ to read more about Titan specifically. I'd check out the creator of Titan Marko Rodriguez as well. http://markorodriguez.com/ He is on Twitter and posts stuff all the time about scaling, graph databases, etc. – ryan1234 Nov 09 '13 at 15:42
  • Thanks Ryan. I'll follow him more closely – Sridhar Sarnobat Nov 11 '13 at 04:59