0

I plan to use Titan for a graph datamodel along with Fanus.

Choice of Data Store - I am yet to decide on the Data store though Cassandra seems to be the obvious choice. Has anyone benchmarked Titan with other Data Stores? Push Notifications : Need to push Traversal responses to the the client. Any case studies on Node.JS(Event based) or Vaadin (Object based)? Thanks!

1 Answers1

2

I experimented with Titan for a medium sized P&C insurer, thinking that Cassandra was the optimal choice for 3-4 million insurance policies (because that sounded big). I was surprised to find that Berkeley and PersistIt were better fits.

Key takeaway: Each of the backends has strengths, and you need to weigh those strengths against the characteristics of your data set. Here is a short summary:

BerkeleyDB and PersistIt

Practical limitation of graphs with 10-100s million vertices. However, for graphs of that size both storage backends exhibit high performance because all data can be accessed locally within the same JVM.

Hazelcast

Low latency optimized alternative that excels at read-mostly workloads that uniformly access a graph. Note, that Hazelcast does not provide durable persistence. Ideal graphs for this backend can fit entirely into memory on one or multiple machines. Also, for this storage backend to be cost effective, most of the graph should be accessed regularly.

HBase and Cassandra

Of course these backends are for the large graphs (billions to hundreds of billions of vertices). Note that they will generally be outperformed by Berkeley or PersistIt on small to medium sized graphs. The choice between the two comes down to a choice between a Consistent-Partitionable system (HBase) and an Available-Partitionable system (Cassandra).

You can also think about this in terms of the semi-antiquated CAP theorem:

https://github.com/thinkaurelius/titan/wiki/images/titan-captheorem.png

Community
  • 1
  • 1
Jonathan Schneider
  • 26,852
  • 13
  • 75
  • 99