-2

I'm looking for options for graph database to be used in a project. I expect to have ~100000 writes (vertix + edge) per day. And much less reads (several times per hour). The most frequent query takes 2 edges depth tracing that I expect to return ~10-20 result nodes. I don't have experience with graph databases and want to work with gremlin to be able to switch to another graph database if needed. Now I consider 2 possibilities: neo4j and Titan.

As I can see there is enough community, information and tools for Neo4j, so I'd prefer to start from it. Their capacity numbers should be enough for our needs (∼ 34 billion nodes, ∼ 34 billion edges). But I'm not sure which hardware requirements will I face in this case. Also I didn't see any parallelisation options for their queries.

On the other hand Titan is built for horizontal scalability and has integrations with intensively parallel tools like spark. So I can expect that hardware requirements can scale in a linear way. But there is much less information/community/tools for Titan.

I'll be glad to hear your suggestions

Olga Gorun
  • 327
  • 3
  • 13

1 Answers1

2

Sebastian Good made a wonderful presentation comparing several databases to each other. You might have a look at his results in here.

A quick summary of the presentation is here enter image description here

For benchmarks on each graph databases with different datasets, different node sizes and caches, please have a look at this Github repository by socialsensor. Just to let you know, the results in the repo are a bit different that the ones in the presentation.

My personal recommendation is:

  1. If you have deep pockets, go for Neo4j. With the technical support and easy CIPHER, things will go pretty quickly.

  2. If you support Open Source (and are patient for its development cycles), go for Titan DB with Amazon Dynamo DB backend. This will give you "infinite" scalability and good performance with both EC2 machines and Dynamo tables. Check here for docs and here for their code for more information.

Mohamed Taher Alrefaie
  • 15,698
  • 9
  • 48
  • 66
  • Thank you for the answer. Can you add something regarding hardware requirements and cost efficiency for provided use case? – Olga Gorun Feb 15 '16 at 22:36
  • @OlgaGorun Have a look here for Neo4j pricing http://neo4j.com/subscriptions/ . For Titan/DynamoDB, have a look here http://tinyurl.com/gpv9t43 – Mohamed Taher Alrefaie Feb 16 '16 at 21:24