2

I am getting error on running the below code for graph creation in Spark graphX. I am running it through spark-shell by following command: ./bin/spark-shell -i ex.scala

Input:

My Vertex File looks like this (each line is a vertex of strings):
word1,word2,word3
word1,word2,word3
...
My Edge File looks like this: (edge from vertex 1 to vertex 2)
1,2
1,3

Code:

// Creating Vertex RDD (Input file has 300+ records with each record having list of strings separated by delimiter (,).
//zipWithIndex done to get an index number for all the entries - basically numbering rows
val vRDD: RDD[(VertexId, Array[String])] = (vfile.map(line => line.split(","))).zipWithIndex().map(line => (line._2, line._1))

// Creating Edge RDD using input file
//val eRDD: RDD[Edge[Array[String]]] = (efile.map(line => line.split(",")))

val eRDD: RDD[(VertexId, VertexId)] = efile.map(line => line.split(","))

// Graph creation
val graph = Graph(vRDD, eRDD)

Error:

Error:
<console>:52: error: type mismatch;
found   : Array[String]
required: org.apache.spark.graphx.Edge[Array[String]]
          val eRDD: RDD[Edge[Array[String]]] = (efile.map(line =>    line.split(",")))

<console>:57: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[(org.apache.spark.graphx.VertexId,   org.apache.spark.graphx.VertexId)]
required: org.apache.spark.rdd.RDD[org.apache.spark.graphx.Edge[?]]
Error occurred in an application involving default arguments.
       val graph = Graph(vRDD, eRDD)
yguw
  • 856
  • 6
  • 12
  • 32
  • Did you build your file? It complains about the the line `val eRDD: RDD[Edge[Array[String]]] = (efile.map(line => line.split(",")))` which from the code above has been commented out... – Glennie Helles Sindholt Nov 06 '15 at 13:22
  • But aside from that your edge RDD needs to be of type `RDD[Edge]` and not a tuple of `VertexId` (which, BTW, is a `Long` and not a `String`). You should read through the documentation http://spark.apache.org/docs/latest/graphx-programming-guide.html – Glennie Helles Sindholt Nov 06 '15 at 13:26

2 Answers2

1

The Edge has an attr -- what type is your attr? Let's assume it's an Int, and let's initialize it to zero:

Instead of this:

val eRDD: RDD[(VertexId, VertexId)] = efile.map(line => line.split(","))

Try this:

val eRDD: RDD[Edge[Int]] = efile.map{ line => 
  val vs = line.split(",");
  Edge(vs(0).toLong, vs(1).toLong, 0)
}
David Griffin
  • 13,677
  • 5
  • 47
  • 65
0

Based on the example you gave, I created two files with vertices and edges :

val vfile = sc.textFile("vertices.txt")
val efile = sc.textFile("edges.txt")

Then you create your RDDs of vertices and edges :

val vRDD: RDD[(VertexId, Array[String])] = vfile.map(line => line.split(","))
                               .zipWithIndex()
                               .map(_.swap) // you can use swap here instead of what you are actually doing.

// Creating Edge RDD using input file
val eRDD: RDD[Edge[(VertexId, VertexId)]] = efile.map(line => {
  line.split(",", 2) match {
    case Array(n1, n2) => Edge(n1.toLong, n2.toLong)
  }
})

Once you have created your vertices and edges RDDs, you can now create your graph :

val graph = Graph(vRDD, eRDD)
eliasah
  • 39,588
  • 11
  • 124
  • 154