4

I am looking for the difference between Titan and Spark-GraphX and which one is best to use. I googled it but didn't get article on this

Could someone provide pointer on this??

Jeen Broekstra
  • 21,642
  • 4
  • 51
  • 73

2 Answers2

7

The Apache TinkerPop project documentation provides a nice overview of the difference between OLTP graph tools (graph databases such as Titan) and OLAP graph tools (graph engines such as Spark-GraphX).

It is not a question of which one (Titan or Spark-GraphX) is best because they do different things.

TItan supports many users simultaneously issuing targeted queries on a very large graph where the queries start at a single (or only a few) node and make short traversals into the graph before an answer is returned.

Graph engines like Spark-GraphX are batch processes that examine substantial parts or all of a graph to get the big picture - like a clustering algorithm or a shortest path calculation.

Often times the best graph solutions will include both a graph database and a graph engine. One comparison that is valid, that you should be aware of is the TinkerPop SparkGraphComputer versus Spark-GraphX.

Many consider the TinkerPop SparkGraphComputer to be a superior graph engine approach than GraphX for at least two reasons:

  1. Using SparkGraphComputer in TinkerPop, you can seamlessly run graph engine algorithms that pull directly from your TinkerPop compliant graph database - like Titan - giving you both graph database and graph engine capabilities pre-integrated.
  2. SparkGraphComputer has an arguably nicer programming model to develop custom algorithms. Unless you are into canned algorithms, then you have to drop to the Pregel API of GraphX to do customized algorithms.
drobin
  • 286
  • 2
  • 6
  • But it said that Apache spark graphx doesn't scale well, and is suitable to work on static graph. my question is if i have millions of edges & vertices in Titan db , will i be able to process it using graphx distributed processing engine over titan db? – Bhavesh Gadoya Mar 24 '17 at 08:42
2

Titan is an implementation of a Graph Database. It is used along with a backend, such as HBase or Cassandra, where it keeps the underlying data.

GraphX is an "API for graphs and graph-parallel computation". Simply - GraphX can be used to query and manipulate an existing database such as Titan. It does not store any data by itself.

imriqwe
  • 1,455
  • 11
  • 15
  • Isn't GraphX itself a graph database like Titan? How can one use Titan-cassandra graph db with GraphX or SparkGraphComputer for graph mining, processing and doing machine learning ? – Parag Jul 31 '17 at 02:21