0

I'm building a graph from an RDD of tuples of source and destination nodes, like this:

Graph.fromEdgeTuples(rawEdges = edgeList, 1)
  1. First off, I did not quite understand what the second parameter is. From the documentation,

    defaultValue the vertex attributes with which to create vertices referenced by the edges

    I still don't get it.

  2. Second, I cannot find anything to compute the size of the biggest component. There is no foreach implemented, nor map or reduceByKey, or anything else after invoking the connectedComponents method.

zero323
  • 322,348
  • 103
  • 959
  • 935
Bob
  • 849
  • 5
  • 14
  • 26

1 Answers1

3
  1. defaultValue is an attribute assigned to all created edges:

    val graph = Graph.fromEdgeTuples(sc.parallelize(Seq(
      (1, 2), (2, 3), (4, 5))), 1)
    
    graph.edges.map(_.attr).distinct.collect 
    // Array[Int] = Array(1)
    
  2. Extract component ids and do a worcount:

    val ids = graph.connectedComponents.vertices map((v: (Long, Long)) => v._2)
    ids.map((_, 1L)).reduceByKey(_ + _)
    
Ivan Chaer
  • 6,980
  • 1
  • 38
  • 48