2

I have a graph with many duplicate vertices, but with different attributes(Long).


    val vertices: RDD[(VertexId, Long)] ...
    val edges: RDD[Edge[Long]] ...

    val graph = Graph(vertices, edges, 0L)

By default GraphX will merge duplicate vertices` attributes with default function

VertexRDD(vertices, edges, defaultVal, (a, b) => a)

So it depends on the order of vertices which attribute will stay in final graph.

I wonder is there any way to set this merge func? Becase for example I need to merge duplicate vertices with the following function

(a,b) => min(a,b)

I did not find any public constructor or something else.

Do I need to create Graph with the following code

val edgeRDD = EdgeRDD.fromEdges(edges)(classTag[ED], classTag[VD])
   .withTargetStorageLevel(edgeStorageLevel).cache()
 val vertexRDD = VertexRDD(vertices, edgeRDD, defaultVertexAttr, (a,b)=>min(a,b)) 
   .withTargetStorageLevel(vertexStorageLevel).cache()
 GraphImpl(vertexRDD, edgeRDD)
ponkin
  • 2,363
  • 18
  • 25

1 Answers1

2

You've already answered much of your own question, however if you are looking for a way to just control the merge and otherwise still use the existing constructor you could do:

val vertices: RDD[(VertexId, Long)] ...
val edges: RDD[Edge[Long]] ...
val mergedVertices = VertexRDD(vertices, edges, default, mergeFun)

val graph = Graph(mergedVertices, edges, 0L)

This is possible since VertexRDD is a subclass of RDD[(VertexId, VD)] (in this case your VD is Long).

Holden
  • 7,392
  • 1
  • 27
  • 33