I have an RDD that contains the following [('column 1',value), ('column 2',value), ('column 3',value), ... , ('column 100',value)]. I want to create a dataframe that contains a single column with tuples.
The closest I have gotten is:
schema = StructType((StructField("char", StringType(), False), (StructField("count", IntegerType(), False))))
my_udf = udf(lambda w, c: (w,c), schema)
and then
df.select(my_udf('char', 'int').alias('char_int'))
but this produces a dataframe with a column of lists, not tuples.