Is there a nice way of going from a Spark DataFrame
to an EdgeRDD
without hardcoding types in the Scala code? The examples I've seen use case classes to define the type of the EdgeRDD
.
Let's assume that our Spark DataFrame
has StructField
("dstID", LongType, false)
and ("srcID", LongType, false)
and between 0 and 22 additional StructField
(We are constraining this so that we can use a TupleN to represent them). Is there a clean way to define an EdgeRdd[TupleN]
by grabbing the types from the DataFrame
? As motivation, consider that we are loading a Parquet file that contains type information.
I'm very new to Spark and Scala, so I realize the question may be misguided. In this case, I'd appreciate learning the "correct" way of thinking about this problem.