1

I am reading a JSON file with specified schema in Spark 2.3.

I found one of the non-nullable column is nullable which is not expected. In other words, I failed to specify the schema for JSON.

See

 val twitterSchema = (new StructType)
     .add(StructField("id_str", StringType, false))
twitterSchema.printTreeString
root
 |-- id_str: string (nullable = false) <------ False. Specified schema

val mdf = spark.read.option("multiline", "true").option("inferSchema","false").schema(twitterSchema).json("/FileStore/tables/twitter.json")
mdf.show(false)
mdf.printSchema

root
 |-- id_str: string (nullable = true). <--------- True? Why?
Jill Clover
  • 2,168
  • 7
  • 31
  • 51

0 Answers0