Finding maximum edge weight in Spark GraphX

Question

Let`s say I have a graph with double values for edge attributes and I want to find the maximum edge weight of my graph. If I do this:

val max = sc.accumulator(0.0) //max holds the maximum edge weight
g.edges.distinct.collect.foreach{ e => if (e.attr > max.value) max.value
= e.attr }

I want to ask how much work is done on the master and how much on the executors, because I know that collect() method brings the entire RDD to the master? Does a parallelism happen? Is there a better way to find the maximum edge weight?

NOTE:

g.edges.distinct.foreach{ e => if (e.attr > max.value) max.value =
e.attr } // does not work without the collect() method.
//I use an accumulator because I want to use the max edge weight later

And if I want to apply some averaging function to the attributes of edges that have same srcId and dstId between two graphs, what is the best way to do it?

zero323 · Accepted Answer · 2015-08-28T11:59:56.417

5

You can either aggregate:

graph.edges.aggregate(Double.NegativeInfinity)(
  (m, e) => e.attr.max(m),
  (m1, m2) => m1.max(m2)
)

or map and take max:

 graph.edges.map(_.attr).max

Regarding your attempts:

If you collect all data is processed sequentially on a driver so there is no reason to use an accumulator.
it doesn't work because accumulators are write-only from a worker perspective.

edited Aug 28 '15 at 11:59

answered Aug 28 '15 at 11:54

zero323

322,348
103
959
935

Finding maximum edge weight in Spark GraphX

1 Answers1