I would like to dynamically generate a dataframe containing a header record for a report, so creating a dataframe from the value of the string below:
val headerDescs : String = "Name,Age,Location"
val headerSchema = StructType(headerDescs.split(",").map(fieldName => StructField(fieldName, StringType, true)))
However now I want to do the same for the data (which is in effect the same data i.e. the metadata).
I create an RDD :
val headerRDD = sc.parallelize(headerDescs.split(","))
I then intended to use createDataFrame to create it:
val headerDf = sqlContext.createDataFrame(headerRDD, headerSchema)
however that fails because createDataframe
is expecting a RDD[Row]
, however my RDD is an array of strings - I can't find a way of converting my RDD to a Row RDD and then mapping the fields dynamically. Examples I've seen assume you know the number of columns beforehand, however I want the ability eventually to be able to change the columns without changing the code - having the columns in a file for example.
Code excerpt based on first answer:
val headerDescs : String = "Name,Age,Location"
// create the schema from a string, splitting by delimiter
val headerSchema = StructType(headerDescs.split(",").map(fieldName => StructField(fieldName, StringType, true)))
// create a row from a string, splitting by delimiter
val headerRDDRows = sc.parallelize(headerDescs.split(",")).map( a => Row(a))
val headerDf = sqlContext.createDataFrame(headerRDDRows, headerSchema)
headerDf.show()
Executing this Results in:
+--------+---+--------+
| Name|Age|Location|
+--------+---+--------+
| Name|
| Age|
|Location|
+--------+---+-------