9

I am currently building a knowledge graph for an e-commerce company, and it mainly consists of the product category hierarchies, properties, and relations among them. Besides the common relational queries, we care about the following points very much:

  1. Master-slave cluster support. This graph database will be used for online search query processing, so high availability is crucial to us. The data volume won't be as big as millions of nodes, so we don't need a distributed cluster that can span data across multiple machines. Still, rather we may need multiple machines that can be read simultaneously, and the service won't go down even if one of the machines is offline.

  2. Fast online query performance. Reasoning about relations can be done offline, so the performance is not that important. But we need to do a lot of online queries like "find the nodes whose property P equals to value V", so we need good performance for online query processing. This database will be read-intensive and won't be changed very much after it's initialization.

  3. Community and documentation. Since our team is new to the field of a graph database, so we expect user-friendly documentation for deployment and development and an active community for solving problems.

Based on the requirements above, I investigated some candidates:

  1. Neo4j. We first tried Neo4j since it's the most popular one in the field. Actually, I liked it, especially the Cypher query language. But we are about to abandon it because the community edition does not support any cluster, and currently, we don't have the budget to pay for the enterprise edition.
  2. OrientDB. OrientDB is like the second most popular one on the market, and it seems to support cluster in its community edition. I use the word "seems" because it is not clearly stated on its website. Can anyone clear this out? Besides, I found a negative article about OrientDB which makes me hesitate: http://orientdbleaks.blogspot.jp/2015/06/the-orientdb-issues-that-made-us-give-up.html

  3. Titan. Titan is also great, but since its original company has been acquired and its original developers are developing a different product, its future development and maintenance are in doubt.

  4. ArangoDB. This one seems to be very fast, according to the performance report(https://www.arangodb.com/2015/10/benchmark-postgresql-mongodb-arangodb/), but I don't know if its online query processing ability is good enough, and its support for the cluster is also unknown to me.

As for documentation and community, I really have no idea since these are the kind of things that you only get to know after you start doing it.

To sum up, based on my requirements, I think OrientDB and ArangoDB maybe my candidates, but I don't know which one to choose because of the points I stated above. Or perhaps is there any other right candidate that I'm missing?

Thanks.

Arsen Khachaturyan
  • 7,904
  • 4
  • 42
  • 42
Derrick Zhang
  • 21,201
  • 18
  • 53
  • 73
  • For Neo4j, there is a startup program offering you free usage of the Enterprise Edition (without support) and a very affordable price with support. Community-wise Neo4j has the most responding community ever ! – Christophe Willemsen Nov 11 '16 at 13:48
  • 1
    Unfortunately (even though you're getting answers) - this question is off-topic, as you're asking for a tool/product recommendation question. – David Makogon Nov 12 '16 at 15:35

3 Answers3

9

Max working for ArangoDB here. ArangoDB does not only do online queries for graphs, but due to its multi-model nature you can mix graph queries with document queries (using secondary indexes), key lookups and joins. It has a sophisticated query engine with an optimizer that is fully aware of the ArangoDB cluster structure and can optimize and distribute query executions across all instances.

In a cluster, sharding, synchronous replication and self-healing are all fully automatic with configurable parameters. Deployment of an ArangoDB cluster is particularly simple (literally two clicks) on Apache Mesos or DC/OS, but is also relatively straightforward with other orchestration frameworks. ArangoDB on DC/OS additionally allows you to scale up and down via the graphical user interface or REST API calls, and failed tasks are automatically replaced.

As to the performance, all our benchmarks show a very good performance, the just released Version 3.1 even has vertex centric indexes, which is particularly important for graph queries.

We do our best to provide extensive documentation, which you find at https://www.arangodb.com/documentation/ . We have a user manual, a manual for our query language AQL as well as one for the HTTP/REST API. Furthermore, we have tutorials, frequently asked questions, a "Cookbook" for standard tasks, and we try to answer questions on StackOverflow and github issues in a timely manner.

All of this is included in the Community Edition, which is available with the Apache 2.0 open source license.

If you have more questions, do not hesitate to reach out to our team or to me personally.

Max Neunhöffer
  • 1,392
  • 8
  • 8
  • 1
    As a user of ArangoDB, and as someone who has also considered and tested Neo4J and OrientDB, I strongly recommend ArangoDB. It performs remarkably well, is highly robust, and has truly excellent community support. My simple queries run in hundreds of microseconds on ArangoDB, and complex ones perform well even before being optimized. The query language is simple, intuitive, yet powerful. After months of continuous, heavy use, I have experienced very few issues, and I got a detailed response from the community or from the developers on every question I've asked within one or two days. – Nate Gardner Nov 12 '16 at 01:07
  • Thanks Max! I think ArangoDB is a good product, but I have some concern about the AQL, because it's imperative rather than declarative like SQL or OrientDB SQL, which means the user needs to be aware of the logical structure of the data to write non-trivial queries. Also I'd like to know if the Tinkerpop framework is well supported. – Derrick Zhang Nov 12 '16 at 02:37
5

OrientDB Community Edition is a free open source software, built upon by a community of developers and is constantly improving. Features such as horizontal scaling, fault tolerance, clustering, sharding and replicating aren’t disabled in OrientDB community.

For more information about cluster, take a look at the official OrientDB guide: http://orientdb.com/docs/last/Tutorial-Clusters.html

Hope it helps.

Regards

Michela Bonizzi
  • 2,622
  • 1
  • 9
  • 16
4

Neo4j enterprise edition can be used under the AGPL license. I am surprised a lot of people arn't aware this. If you are using Neo4j Enterprise as a server and communicating with it via REST or bolt protocol (Apache Licensed), then you don't have to worry about releasing the code of the system connecting to it under AGPL.

If you are using it embedded, then you to adhere to AGPL, but then why would you need Neo4j enterprise in that situation?

Remember to clone and compile Neo4j Enterprise from github if you plan on using it under AGPL, don't download trial.

Neo Technology gives great support and that is what you are essentially paying for for the enterprise subscription.

  • This is incorrect. To quote Philip Rathle from this answer (https://stackoverflow.com/questions/24646962/neo4j-how-to-setup-failover-in-community-edition): a web-facing app using Neo4j Enterprise without a commercial license from Neo Tech will need to be open sourced under the AGPL. – shafeen Sep 13 '18 at 17:59