4

I'm playing around with graphx. I've built a graph I'm trying to update the weight of a relation,

import org.apache.spark.rdd.RDD
import org.apache.spark.graphx._
def pageHash(title:String )  = title.toLowerCase.replace(" ","").hashCode.toLong


val vertexArray = Array(
  (pageHash("Alice"), ("Alice")),
(pageHash("Bob"), ("Bob")),
(pageHash("Charlie"), ("Charlie")),
(pageHash("David"), ("David")),
(pageHash("Ed"), ("Ed")),
(pageHash("Fran"), ("Fran"))
)     
val edgeArray = Array(
 Edge(pageHash("Bob"), pageHash("Alice"), 7),
 Edge(pageHash("Bob"), pageHash("David"), 2),
Edge(pageHash("Charlie"), pageHash("Bob"), 4),
Edge(pageHash("Charlie"), pageHash("Fran"), 3),
Edge(pageHash("David"), pageHash("Alice"), 1),
Edge(pageHash("Ed"), pageHash("Bob"), 2),
Edge(pageHash("Ed"), pageHash("Charlie"), 8),
Edge(pageHash("Ed"), pageHash("Fran"), 3)
)    


val vertexRDD: RDD[(Long, (String))] = sc.parallelize(vertexArray)
val edgeRDD: RDD[Edge[Int]] = sc.parallelize(edgeArray)
val graph: Graph[(String), Int] = Graph(vertexRDD, edgeRDD)

graph.triplets.filter(triplet => triplet.srcAttr.equals("Bob")&&triplet.dstAttr.equals("Alice")).collect()

graph.triplets.filter(triplet => triplet.srcAttr.equals("Bob")&&triplet.dstAttr.equals("Alice")).
    map(triplet=> triplet.attr.toString.toInt+1).collect()

I'm not able to increase the weight of the node , is there any way to do this?

tourist
  • 4,165
  • 6
  • 25
  • 47

1 Answers1

5

You can't directly update an Edge, but you can functionally do the same thing by adding a new Edge with the same src and dst to the edges RDD in your Graph and the delta of the weight, and then calling groupEdges on your. In other words, if you have the following graph:

val edges = sc.parallelize(Array(Edge(1L, 2L, 1.0), Edge(2L, 3L, 2.0)))
val vertices = sc.parallelize(Array((1L, "Bob"), (2L, "Tom"), (3L, "Jerry")))

val graph = Graph(vertices, edges)

You can add 1.0 to the weight of one of the edges like this:

val newGraph = Graph(graph.vertices, graph.edges.union(
  sc.parallelize(Array(Edge(2L, 3L, 1.0)))
).groupEdges((a, b) => a + b)
David Griffin
  • 13,677
  • 5
  • 47
  • 65
  • 1
    How efficient is the Union operation as in if I'm trying to add an edge iteratively 1billion times? My current understanding is it will take new rdd with an edge throwing away the previous graph generated in each iteration. Is it an efficient operation in this scenario? – tourist Apr 02 '16 at 12:17