6

I know that databases in general can scale horizontally using master/slave replication. This is a great strategy when the number of concurrent reads is growing.

As the number of concurrent writes or just the amount of data starts to grow, though, master/slave replication doesn't get you anything, so you need to partition your data instead.

This works great for key-value scenarios. A classic example to me is TinyURL/bit.ly; reading/writing the data for short URL foo can be totally independent of reading/writing data for short URL bar.

But what are you supposed to do if you're in a graph scenario? More concretely, is it possible to partition a graph database like Neo4j at all? If so, how?

I can't wrap my head around how you could possibly break up a graph without defeating the purpose of using a graph database (efficient traversals).

Community
  • 1
  • 1
Aseem Kishore
  • 10,404
  • 10
  • 51
  • 56
  • 2
    Have a look at what Jim Webber wrote on the topic: [On Sharding Graph Databases](http://jim.webber.name/2011/02/16/3b8f4b3d-c884-4fba-ae6b-7b75a191fa22.aspx) and [Scaling Neo4j with Cache Sharding and Neo4j HA](http://jim.webber.name/2011/02/23/abe72f61-27fb-4c1b-8ce1-d0db7583497b.aspx)! – nawroth Mar 18 '11 at 09:57

1 Answers1

5

You rarely traverse an entire graph structure.

Further, graph structures are rarely heavily connected among all the nodes.

With a little care, you can locate clusters of well connected nodes separated by a small number of connections to other clusters.

http://en.wikipedia.org/wiki/Cluster_analysis

If you partition based on clustering, then traversal within the cluster may be faster, but traversal to another cluster will be slower.

Overall benefit of partitioning depends on the ratio of in-cluster traversals compared with between-cluster traversals.

S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • I had thought about clustering, but wasn't sure if it was reasonable or not; thanks for confirming. It does have a clear trade-off though (you risk inefficient traversals if nodes in one cluster start connecting to nodes in another cluster), so I'd still love to know if there are other options. – Aseem Kishore Mar 17 '11 at 19:37