I am reading a JSON file with specified schema in Spark 2.3.
I found one of the non-nullable column is nullable which is not expected. In other words, I failed to specify the schema for JSON.
See
val twitterSchema = (new StructType)
.add(StructField("id_str", StringType, false))
twitterSchema.printTreeString
root
|-- id_str: string (nullable = false) <------ False. Specified schema
val mdf = spark.read.option("multiline", "true").option("inferSchema","false").schema(twitterSchema).json("/FileStore/tables/twitter.json")
mdf.show(false)
mdf.printSchema
root
|-- id_str: string (nullable = true). <--------- True? Why?