5

I'm trying to create a Graph using some Google Web Graph data which can be found here:

https://snap.stanford.edu/data/web-Google.html

import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD



val textFile = sc.textFile("hdfs://n018-data.hursley.ibm.com/user/romeo/web-Google.txt")
val arrayForm = textFile.filter(_.charAt(0)!='#').map(_.split("\\s+")).cache()
val nodes = arrayForm.flatMap(array => array).distinct().map(_.toLong)
val edges = arrayForm.map(line => Edge(line(0).toLong,line(1).toLong))

val graph = Graph(nodes,edges)

Unfortunately, I get this error:

<console>:27: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[Long]
 required: org.apache.spark.rdd.RDD[(org.apache.spark.graphx.VertexId, ?)]
Error occurred in an application involving default arguments.
       val graph = Graph(nodes,edges)

So how can I create a VertexId object? For my understanding it should be sufficient to pass a Long.

Any ideas?

Thanks a lot!

romeo

Romeo Kienzler
  • 3,373
  • 3
  • 36
  • 58

2 Answers2

4

Not exactly. If you take a look at the signature of the apply method of the Graph object you'll see something like this (for a full signature see API docs):

apply[VD, ED](
    vertices: RDD[(VertexId, VD)], edges: RDD[Edge[ED]], defaultVertexAttr: VD)

As you can read in a description:

Construct a graph from a collection of vertices and edges with attributes.

Because of that you cannot simply pass RDD[Long] as a vertices argument ( RDD[Edge[Nothing]] as edges won't work either).

import scala.{Option, None}

val nodes: RDD[(VertexId, Option[String])] = arrayForm.
    flatMap(array => array).
    map((_.toLong, None))

val edges: RDD[Edge[String]] = arrayForm.
    map(line => Edge(line(0).toLong, line(1).toLong, ""))

Note that:

Duplicate vertices are picked arbitrarily

so .distinct() on nodes is obsolete in this case.

If you want to create a Graph without attributes you can use Graph.fromEdgeTuples.

zero323
  • 322,348
  • 103
  • 959
  • 935
  • Hi, following back to the question in the title, is there no way to create a vertex object on the fly like you can with edges? For example val test = Edge(1, 1, 1) . I tried val test = Vertex(1, 1) and haven't been able to find any constructor online. – mt88 May 11 '16 at 22:39
  • @mt88 There is no need for that. `Vertex` is just a `Tuple2[VertexId, T]` where `VertexId` is an alias for `Long`. – zero323 May 12 '16 at 09:45
2

The error message said that nodes must be type of RDD[(Long, anything else)]. The first element in tuple is vertexId and the second element could anything, for example, String with node description. Try to simply repeat vertexId:

val nodes = arrayForm
             .flatMap(array => array)
             .distinct()
             .map(x =>(x.toLong, x.toLong))
Nikita
  • 4,435
  • 3
  • 24
  • 44