4

How do I resolve this issue?

rdd.collect()  //['3e866d48b59e8ac8aece79597df9fb4c'...]

rdd.toDF()    //Can not infer schema for type: <type 'str'>

myschema=StructType([StructField("col1", StringType(),True)])
rdd.toDF(myschema).show()

// StructType can not accept object "3e866d48b59e8ac8aece79597df9fb4c" in type

Bala
  • 11,068
  • 19
  • 67
  • 120

1 Answers1

13

It seems you have:

rdd = sc.parallelize(['3e866d48b59e8ac8aece79597df9fb4c'])

Which is a one dimensional data structure, a data frame is 2d; map each number to a tuple solves the problem:

rdd.map(lambda x: (x,)).toDF().show()
+--------------------+
|                  _1|
+--------------------+
|3e866d48b59e8ac8a...|
+--------------------+
Psidom
  • 209,562
  • 33
  • 339
  • 356