I'm new to Spark
and Scala
, and I'm trying to carry out a simple task of creating a graph from data in a text file.
From the documentation
https://spark.apache.org/docs/0.9.0/api/graphx/index.html#org.apache.spark.graphx.Graph$@fromEdges[VD,ED]%28RDD[Edge[ED]],VD%29%28ClassTag[VD],ClassTag[ED]%29:Graph[VD,ED]
I can see that I can create a graph from tuples of vertices
.
My simple text file looks like this, where each number is a vertex:
v1 v3
v2 v1
v3 v4
v4
v5 v3
When I read the data from the file
val myVertices = myData.map(line=>line.split(" ")) I get an RDD[Array[String]].
My questions are:
If this is the right way to approach the problem, how do I turn the
RDD[Array[String]]
into the correct format, which according to the documentation isRDD[(VertexId, VertexId)]
(alsoVertexID
has to be of type long, and I am working with strings)Is there an alternative, easier way in which I can construct a graph from a similar structure of csv file?
Any suggestion would be very welcome. Thanks!