I am currently building a knowledge graph for an e-commerce company, and it mainly consists of the product category hierarchies, properties, and relations among them. Besides the common relational queries, we care about the following points very much:
Master-slave cluster support. This graph database will be used for online search query processing, so high availability is crucial to us. The data volume won't be as big as millions of nodes, so we don't need a distributed cluster that can span data across multiple machines. Still, rather we may need multiple machines that can be read simultaneously, and the service won't go down even if one of the machines is offline.
Fast online query performance. Reasoning about relations can be done offline, so the performance is not that important. But we need to do a lot of online queries like "find the nodes whose property P equals to value V", so we need good performance for online query processing. This database will be read-intensive and won't be changed very much after it's initialization.
Community and documentation. Since our team is new to the field of a graph database, so we expect user-friendly documentation for deployment and development and an active community for solving problems.
Based on the requirements above, I investigated some candidates:
- Neo4j. We first tried Neo4j since it's the most popular one in the field. Actually, I liked it, especially the Cypher query language. But we are about to abandon it because the community edition does not support any cluster, and currently, we don't have the budget to pay for the enterprise edition.
OrientDB. OrientDB is like the second most popular one on the market, and it seems to support cluster in its community edition. I use the word "seems" because it is not clearly stated on its website. Can anyone clear this out? Besides, I found a negative article about OrientDB which makes me hesitate: http://orientdbleaks.blogspot.jp/2015/06/the-orientdb-issues-that-made-us-give-up.html
Titan. Titan is also great, but since its original company has been acquired and its original developers are developing a different product, its future development and maintenance are in doubt.
- ArangoDB. This one seems to be very fast, according to the performance report(https://www.arangodb.com/2015/10/benchmark-postgresql-mongodb-arangodb/), but I don't know if its online query processing ability is good enough, and its support for the cluster is also unknown to me.
As for documentation and community, I really have no idea since these are the kind of things that you only get to know after you start doing it.
To sum up, based on my requirements, I think OrientDB and ArangoDB maybe my candidates, but I don't know which one to choose because of the points I stated above. Or perhaps is there any other right candidate that I'm missing?
Thanks.