I have a huge graph (say for example 300,000 nodes and 1,000,000 edges) which I'm analyzing using Python on an Ubuntu machine with 32GB of RAM and 4 CPU cores.
I found graph-tool to be a very efficient tool for the measurement of betweenness centrality (weighted version), much faster than Networkx. However, the problem is that loading such a huge graph in memory kills my application (out-of-memory).
For this reason, I was thinking switching to Neo4j, to store the graph and calculate betweenness centrality.
Can you help me with the following questions?
- Will Neo4j allow me the direct calculation of weighted betweenness centrality (shortest paths are computed considering edge weight), with the possibility of passing the results for every node to Python?
- Will the use of Neo4j for the calculation save me from the Out-of-Memory kill? Or will the problem persist?
- I could not find any performance comparison. Is the calculation of betweenness faster in graph-tools or in Neo4j? How much is the difference?
- Is there a better solution to my problem that I did not consider?