I have a very large, weighted graph on Azure COSMOS DB. Number of vertices and edges are in billions and size of DB is several TBs. I am trying to cluster the graph on Spark using some custom clustering algorithm.
I understood this can be done using Spark and GraphFrames. I can also find some old algorithm online which uses GraphX and Pregel Framework. But i understand it is better to be implemented in GraphFrames now, for which i am not able to find any examples. I watched several videos, read blogs and could create a small graph and play around with it using GraphFrames (using inbuilt APIs like LPA, BFS, etc)
My Questions:
How to implement graph clustering using GraphFrames? Is there any example a custom graph clustering algorithm using GraphFrames which can run in the distributed fashion? Will just using Graph/Data Frame and writing regular clustering code take care of distrusted processing? or do I have to write in certain way (similar to GraphX or Pregel)?
How do I load the entire graph and run my clustering algorithm. When I load it on GraphFrame, will it load the entire data (several TBs) in memory? Or does it automatically load only that is necessary or should i write some custom code to load what is needed during the processing?
Apologies if the questions are basic, I am new to Spark, Clustering and Graph Frames.