I am working on a project that involves a RandomWalk on a large graph(too big to fit in memory). I coded it in Python using networkx but soon, the graph became too big to fit in memory, and so I realised that I needed to switch to a distributed system. So, I understand the following:
- I will need to use a graph database as such(Titan, neo4j, etc)
- A graph processing framework such as Apache Giraph on hadoop/ graphx on spark.
Firstly, are there enough APIs to allow me to continue to code in Python, or should I switch to Java?
Secondly, I couldn't find exact documentation on how I can write my custom function of traversal(in either Giraph or graphx) in order to implement the Random Walk algorithm.