0

I have the following graph in Graphx

graph.vertices.foreach(println)

(6109253945443866644,"Futurama"@en)
(7558506336564503178,"AccessibleComputing"@en)
(0,null)
(-2278222762001827643,"Programming languages"@en)
(-9007336571746445204,http://dbpedia.org/resource/Category:Presocratic_philosophers)
(-3236797006683951166,http://dbpedia.org/resource/Category:Programming_languages)
(-4159090027031366209,http://dbpedia.org/resource/Anaximenes_of_Miletus)
(7722304331424482609,http://dbpedia.org/resource/Category:Futurama)
(-323898215277667127,http://dbpedia.org/resource/AccessibleComputing)

I have applied connected components algorithm on this graph whose output is as below :-

ccGraph.vertices.foreach(println)

(6109253945443866644,6109253945443866644)
(7558506336564503178,-323898215277667127)
(0,0)
(-2278222762001827643,-3236797006683951166)
(-9007336571746445204,-9007336571746445204)
(-3236797006683951166,-3236797006683951166)
(-4159090027031366209,-9007336571746445204)
(7722304331424482609,6109253945443866644)
(-323898215277667127,-323898215277667127)

I can't find a way to find the vertex label/vertex name for (vertexID,vertexID) in ccGraph such that the output transforms from
(vertexID,vertexID) => (vertexLabel,vertexLabel)

I have tried the following approach but failed

    ccGraph.vertices.map({case arr =>  
val k1 = graph.vertices.lookup(arr(0))
val k2 = graph.vertices.lookup(arr(1))
(k1,k2)
})

<console>:51: error: (org.apache.spark.graphx.VertexId, org.apache.spark.graphx.VertexId) does not take parameters
                  ccGraph.vertices.map({case arr =>  val k1 = graph1.vertices.lookup(arr(0))
                                                                                        ^
<console>:52: error: (org.apache.spark.graphx.VertexId, org.apache.spark.graphx.VertexId) does not take parameters
                                                     val k2 = graph1.vertices.lookup(arr(1))
  • What do you mean by vertex label ? When you print a vertex, it's already displayed as (vertex_id, vertex_value). – Kien Truong Jul 04 '16 at 11:46
  • When you print connected Components ccGraph then it takes the form (vertexid,vertexid) and not the usual (vertexid,vertexvalue) – user3663737 Jul 04 '16 at 12:10

1 Answers1

0

The connected components algorithm generate a new graph with the vertex value set to the component id. You have to join them with the original graph if you want to get the original value.

ccGraph.joinVertices(graph.vertices) { (id, component_id, old_value) =>
  ...
}
Kien Truong
  • 11,179
  • 2
  • 30
  • 36