Let`s say I have a graph with double values for edge attributes and I want to find the maximum edge weight of my graph. If I do this:
val max = sc.accumulator(0.0) //max holds the maximum edge weight
g.edges.distinct.collect.foreach{ e => if (e.attr > max.value) max.value
= e.attr }
I want to ask how much work is done on the master and how much on the executors, because I know that collect() method brings the entire RDD to the master? Does a parallelism happen? Is there a better way to find the maximum edge weight?
NOTE:
g.edges.distinct.foreach{ e => if (e.attr > max.value) max.value =
e.attr } // does not work without the collect() method.
//I use an accumulator because I want to use the max edge weight later
And if I want to apply some averaging function to the attributes of edges that have same srcId and dstId between two graphs, what is the best way to do it?