Dataframes to EdgeRDD (GraphX) using Scala api to Spark

Question

Is there a nice way of going from a Spark DataFrame to an EdgeRDD without hardcoding types in the Scala code? The examples I've seen use case classes to define the type of the EdgeRDD.

Let's assume that our Spark DataFrame has StructField ("dstID", LongType, false) and ("srcID", LongType, false) and between 0 and 22 additional StructField (We are constraining this so that we can use a TupleN to represent them). Is there a clean way to define an EdgeRdd[TupleN] by grabbing the types from the DataFrame? As motivation, consider that we are loading a Parquet file that contains type information.

I'm very new to Spark and Scala, so I realize the question may be misguided. In this case, I'd appreciate learning the "correct" way of thinking about this problem.

score 0 · Answer 1 · answered Jul 02 '15 at 21:45

0

Probably the simplest way to accomplish this would be to map over the Row objects in the Dataframe (with map) and return that way.

answered Jul 02 '15 at 21:45

Holden

7,392
1
27
33

Dataframes to EdgeRDD (GraphX) using Scala api to Spark

1 Answers1