3

I have a dataframe dfMaster which has three columns, vertex1, vertex2, weight. I'm trying to create a GraphX directed weighted graph which has vertexes from V1 and V2 and edges between them with their corresponding weight. I can create the edge and vertex df's by doing:

val edgeDF = dfMaster.select($"vertex1", $"vertex2", $"weight").distinct()
val vertexDF = (dfMaster.select("vertex1").toDF().unionAll(DFMaster.select("vertex2").toDF())).distinct()

How do I then load this into a weighted graph? Thanks for the help.

mt88
  • 2,855
  • 8
  • 24
  • 42

1 Answers1

4

As far as I know, Spark GraphX currently supports only creation from RDDs. The main methods available for graph creation can be found at the following classes:

For your case, I suggest the following code:

import org.apache.spark.sql.Row
import org.apache.spark.graphx.{Graph, Edge}

val edgeDF = dfMaster.select($"vertex1", $"vertex2", $"weight").distinct()

val edgeRDD = edgeDF.map { 
  case Row(srcId: Double, dstId: Double, wgt: Double) => Edge[Double](srcId.toLong, dstId.toLong, wgt)
}

val graph = Graph.fromEdges[Int, Double](edgesRDD, 0)   

The fromEdges method above infers the vertices from the edges and sets 0 as their attribute.

Assumptions:

  • vertex1, vertex2 and weight are columns of Double;
  • There is no attribute information for vertices, so it's ok if all of them are created with 0.
Daniel de Paula
  • 17,362
  • 9
  • 71
  • 72
  • Hey thanks for the help. The types of vertex1, vertex2, and vertex3 are all doubles. I went ahead and modified your code and changed it to doubles. However when i ran the edgeRDD line, I get three errors at Edge[Double](srcId, dstId, wgt) with the error message: type mismatch, found Double, required org.apache.spark.graphx.VertexId . Do you happen to know what this means? – mt88 May 05 '16 at 23:17
  • The vertices ids must be of type Long (or VertexId, which is a sub-type of Long) – Daniel de Paula May 05 '16 at 23:18
  • 1
    If you can't get unique values for your vertices by casting to Long, I'm afraid you will have to use something like `zipWithUniqueId`. Please let me know if `toLong` is enough for you. – Daniel de Paula May 05 '16 at 23:28
  • Hey, thanks again for the help. Just to clarify, this will only create an edge from vertex1 to vertex2 right? Not in both ways? The reason I ask is that when i try to get the vertices in the graph with no indegree by doing: val Trees = inviteGraph.inDegrees.filter { case (id, indegree) => indegree == 0 } I get no nodes back when i take 5 and print. I do get nodes back when I set it == 1. Thanks for helping again. – mt88 May 06 '16 at 00:28
  • You are correct. This approach creates edges with `vertex1` mapped to `srcId` and `vertex2` mapped to `dstId`. However, if I may repeat, I recommend you to make sure that you did not insert wrongly duplicated values when you called `toLong` for the double-typed vertices. Each `long` value passed to the constructor of `Edge` must represent a specific vertex in you graph. – Daniel de Paula May 06 '16 at 00:36