The project I am working on currently uses Neo4j community. Currently we process 1-5M vertices with 5-20M edges but we aim to handle a volume of 10-20M vertices w/ 50-100M edges. We are discussing the idea of switching to a graph database open source project that would enable us to scale in these proportion. Currently our mind is set on Janusgraph with Cassandra.
We have some questions regarding the capabilities and development of Janusgraph, we ould be glad if someone could answer! (Maybe Misha Brukman or Aaron Ploetz?)
On Janusgraph capabilities:
We did some experiments using the Janusgraph ready-to-use docker image, queries being issued through a java program. The java program and docker image are run on the same machine. At the magnitude of 10k-20k vertices with 50k-100k edges inserted, a query to with all the vertices possessing a give property takes 8 to 10 seconds (mean time over 10 identical queries, time elapsed before and after the command in the java program). The command itself is really simple:
g.V().has("secText", "some text").inE().outV();
Moreover, the docker image seems to break down when I try to insert more record (extending towards 100k vertices).
We wonder if it's due to the limited nature of the docker image or if there is any problem or if it could be normal? Anyway it seems really, really slow.
We set up a 2 nodes Cassandra cluster (on 2 different VMs) with Janusgraph on town, again the results were quite slow.
From what I read on the Internet, people seem to use Janusgraph deployment with millions of vertices in production, so I guess they can execute simple queries in matter of milliseconds. What is the secret there? Do you need like 128GB of RAM for the whole thing to perform correctly? Or maybe there is a guide a good practices to follow that I am unaware of? I tried my best using Janusgraph official documentation and user comments on forums like here but that ain't much I'm afraid :/
On Janusgraph future:
- Janusgraph seemed to evolve quite quickly over the first years (like 2016-2018) but the past few monthes I didn't see much activity from the Janusgraph community, except for the release of version 0.5 a few monthes ago. For example, no meeting since last year. So I'm wondering: is Janusgraph on the right tracks to last and be maintained for many years to come. Did things slow down a bit because of COVID or is there a thing?
- Is backward compatibility considered in Janusgraph? From what I can read in the docs, many things have changed from version 0.2/0.3 to 0.4 and 0.5. Many are to come like, for example, Cassandra Thrift and embedded being deprecated. So, in a production environment where we can't always afford to update version every year, let aside the code modification in a case where some component is deprecated, does Janusgraph dev think of achieving some backward compatibility soon, or maybe should we still wait for the 1.0 version for that?
Thank you for reading all this and I am looking forward to all the answers you can give me :) have a nice day!
Mael