14

Just wondering if anyone has any information on the status of project Rassilon, Neo4j's side project which focuses on improving horizontal scalability of Neo4j?

It was first announced in January 2013 here.

I'm particularly interested in knowing more about when the graph size limitation will be removed and when sharding across clusters will become available.

David Makogon
  • 69,407
  • 21
  • 141
  • 189
Mike
  • 683
  • 10
  • 22

1 Answers1

17

The node & relationship limits are going away in 2.1, which is the next release post 2.0 (which now has a release candidate).

Rassilon is definitely still in the mix. That said, that work is not taking precedence over things like the significant bundle of new features that are in 2.0. The reason is that Neo4j as it stands today is extremely capable of scaling, using the variety of architecture features outlined below (with some live examples):

www.neotechnology.com/neo4j-scales-for-the-enterprise/

There's lots of cleverness in the current architecture that allows the graph to perform & scale well without sharding. Because once you start sharding, you are destined to traverse over the network, which is a bad thing (for latency, query predictability etc.) So while there are some extremely large graphs that, largely for write throughput reasons, must trade off performance for uber scale (by sharding), the happy thing is that most graphs don't require this compromise. Sharding is required only in the 1% case, which means that nearly everyone can have their cake and eat it too. There are currently Neo4j clusters in production customers with 1B+ individuals in their graph, backing web applications with tens of millions of users. These use comparatively small (but very fast, very efficient) clusters. To give you some idea of the kinds of price-performance we regularly see: we've had users tell us that a single Neo4j instance could the same work as 10 Oracle instances, only faster.

A well-tuned Neo4j cluster can support upwards of 10K transactional writes per second, and an arbitrarily high number of reads per second. Read throughput scales linearly as instances are elastically plugged in. Cache sharding is a design pattern that ensures that you don't have to keep the entire graph in memory.

Aran Mulholland
  • 23,555
  • 29
  • 141
  • 228
Philip Rathle
  • 1,555
  • 12
  • 9
  • 1
    As you are saying, "Neo4j cluster can support upwards of 10K transactional writes per second" but I don't think so. In my case, I got locking errors If 200 transactional write requests/second. And the bad thing is database couldnot handle any other request after that. – Avinash Sep 10 '15 at 06:01
  • 1
    Would someone please respond to @Avinash's issue ? was this really scalability issue, or just some issue he faced !! – Raghav Oct 20 '16 at 21:19
  • I must say, Neo4j is not production ready database. Unfortunately, I had to switch the database. – Avinash Jan 23 '17 at 06:25
  • 1
    In absence of a neo4j response I'm guessing here: query-time "index-free-adjacency" is paid for with every write - real-world IDs (bank/ip/phone/email IDs) need looking up in an index to convert into neo4j's internal pointers on insertion. Also write locks required on adding edges to nodes which can cause deadlock with multi-threaded writes. I suspect some of the claimed write speeds are for initial bulk loads with indexes/transactions turned off rather than realistic scenarios with ongoing incremental additions. See http://neo4j.com/docs/operations-manual/current/tools/import/ – MarkH Jun 23 '17 at 09:35