I have a vertices and edges file in order to construct a graph using the apply
method with GraphX.
My input files are in json.gz
format and I am thus using the spark sqlContext.read.json
function to import the data.
val vertices = sqlContext.read.json("vertices.json.gz")
res3: org.apache.spark.sql.DataFrame = [index: bigint, point: array<double>, toid: string]
Using vertices.show()
I can inspect the input.
+-----+--------------------+--------------------+
|index| point| toid|
+-----+--------------------+--------------------+
| 1|[508180.748, 1953...|osgb4000000031043205|
| 2|[508163.122, 1953...|osgb4000000031043206|
| 3|[508172.075, 1953...|osgb4000000031043207|
| 4|[508513.0, 196023.0]|osgb4000000031043208|
| 5|[514358.399, 1503...|osgb4000000029797733|
I have a edges
file that I follow the same process with.
I now wish to construct a graph with GraphX
using the Graph.apply option. I know that this method does not accept DataFrame
but rather RDD
inputs.
In order to convert the files I have tried using the .rdd
option resulting in:
val vert_rdd = vertices.rdd
vert_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[12] at rdd at <console>:36
However, this outputs a .sql.Row
object rather than a standard RDD
object. Thus, it is not accepted as an input when creating a Graph using graph.apply
.
Is there a more direct method for using JSON
files to GraphX or is there a different way to convert from a DataFrame
to a RDD
?
Edit.
Ultimately, the vertices file must be converted to a VertexRDD class. Where the index takes the place of VertexId
and the point
tuple and toid
string are attributes.
The edge file must be converted to a EdgeRDD class.
Edges
file looks like this:
edges: org.apache.spark.sql.DataFrame = [index: bigint, length: double, nature: string, negativeNode: string, polyline: array<double>, positiveNode: string, term: string, toid: string]
+-----+------------------+------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|index| length| nature| negativeNode| polyline| positiveNode| term| toid|
+-----+------------------+------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
| 1| 112.8275895775762|Single Carriageway|osgb4000000023183407|[492019.481, 1565...|osgb4000000023183409|Private Road - Re...|osgb4000000023296573|
| 2|141.57731318733806|Single Carriageway|osgb4000000023763485|[492144.493, 1567...|osgb4000000023183408|Private Road - Re...|osgb4000000023296574|
| 3|190.23352139011513|Single Carriageway|osgb4000000023183650|[492835.25, 15687...|osgb4000000023183652|Private Road - Re...|osgb4000000023296638|