0

I have a vertices and edges file in order to construct a graph using the apply method with GraphX.

My input files are in json.gz format and I am thus using the spark sqlContext.read.json function to import the data.

val vertices = sqlContext.read.json("vertices.json.gz")

res3: org.apache.spark.sql.DataFrame = [index: bigint, point: array<double>, toid: string]

Using vertices.show() I can inspect the input.

+-----+--------------------+--------------------+
|index|               point|                toid|
+-----+--------------------+--------------------+
|    1|[508180.748, 1953...|osgb4000000031043205|
|    2|[508163.122, 1953...|osgb4000000031043206|
|    3|[508172.075, 1953...|osgb4000000031043207|
|    4|[508513.0, 196023.0]|osgb4000000031043208|
|    5|[514358.399, 1503...|osgb4000000029797733|

I have a edges file that I follow the same process with.

I now wish to construct a graph with GraphX using the Graph.apply option. I know that this method does not accept DataFrame but rather RDD inputs.

In order to convert the files I have tried using the .rdd option resulting in:

val vert_rdd = vertices.rdd
vert_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = MapPartitionsRDD[12] at rdd at <console>:36

However, this outputs a .sql.Row object rather than a standard RDD object. Thus, it is not accepted as an input when creating a Graph using graph.apply.

Is there a more direct method for using JSON files to GraphX or is there a different way to convert from a DataFrame to a RDD?

Edit.

Ultimately, the vertices file must be converted to a VertexRDD class. Where the index takes the place of VertexId and the point tuple and toid string are attributes.

The edge file must be converted to a EdgeRDD class.

Edges file looks like this:

edges: org.apache.spark.sql.DataFrame = [index: bigint, length: double, nature: string, negativeNode: string, polyline: array<double>, positiveNode: string, term: string, toid: string]

+-----+------------------+------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|index|            length|            nature|        negativeNode|            polyline|        positiveNode|                term|                toid|
+-----+------------------+------------------+--------------------+--------------------+--------------------+--------------------+--------------------+
|    1| 112.8275895775762|Single Carriageway|osgb4000000023183407|[492019.481, 1565...|osgb4000000023183409|Private Road - Re...|osgb4000000023296573|
|    2|141.57731318733806|Single Carriageway|osgb4000000023763485|[492144.493, 1567...|osgb4000000023183408|Private Road - Re...|osgb4000000023296574|
|    3|190.23352139011513|Single Carriageway|osgb4000000023183650|[492835.25, 15687...|osgb4000000023183652|Private Road - Re...|osgb4000000023296638|
LearningSlowly
  • 8,641
  • 19
  • 55
  • 78

0 Answers0