3

We want to present our data in a graph and thought about using one of graphdbs. During our vendor investigation process, one of the experts suggested that using graphdb on dense graph won't be efficient and we'd better off with columnar-based db like cassandra.

I gave your use case some thought and given your graph is very dense (number of relationships = number of nodes squared) and that you seem to only need a few hop traversals from the particular node along different relationships. I’d actually recommend you also try out a columnar database.

Graph databases tend to work well when you have sparse graphs (num of relationships << num of nodes ^ 2) and with deep traversals - from 4-5 hops to hundreds of hops. If I understood your use-case correctly, a columnar database should generally outperform graphs there.

Our use case will probably end up with nodes connected to 10s of millions of other nodes with about 30% overlap between different nodes - so in a way, it's probably a dense graph. Overall there will be probably a few billion nodes.

Looking in Neo4j source code I found some reference of isDense flag on the nodes to differentiate the processing logic - not sure what that does. But I also wonder whether it was done as an edge case patch and won't work well if most of the nodes in the graph are dense.

Does anyone have any experience with graphdbs on dense graphs and should it be considered in such cases?

All opinions are appreciated!

Community
  • 1
  • 1
Victor G.
  • 425
  • 5
  • 14
  • Perhaps some clarification, by no way O(|E|) will be equal to O(|N|^2) as each node is connected to several millions of other nodes but there are still a few billion nodes overall. – Victor G. Mar 15 '18 at 18:12
  • What the expert says makes perfect sense to me. Are there any graph algorithms you want to use? – Yoshi Mar 16 '18 at 06:46
  • Yes, shortest path, probably minimal cut and rankpage – Victor G. Mar 16 '18 at 12:54

2 Answers2

1

When the use of graph DB comes into mind it shows multiple tables are linked with each other, which is a perfect use case for graph DB.

We are handling JansuGraph with a scale of 20B vertices and 15B edges. It's not a large dense graph with a vertex connected with 10s M vertices. But still, we observed the super node case, where a vertex is connected with more number of vertices than expectation. But with our use case while doing traversal (DFS) we always traverse with max N children nodes of a node and a limited depth say M, which is absolutely fine considering the number of joins required in non-graph DBS (columnar, relational, Athena, etc..).

The only way (i feel) to get all relations of a node is to do a full DFS or inner joins datasets until no common data found.

Excited to know more about other creative solutions.

Bishnu
  • 383
  • 4
  • 14
0

I do not have experience with dense graphs using graph databases, but I do not think that dense graph is a problem. Since You are going to use graph algorithms, I suppose, You would benefit from using graph database (depending on the algorithms complexity - the more "hops", the more You benefit from constant edge traversing time).

A good trade-off could be to use one of not native graph databases (like Titan, its follow-up JanusGraph, Mongo Db, ..), which actually uses column based storages (Cassandra, Barkley DB, .. ) as its backend.

Jakub Moravec
  • 181
  • 1
  • 10