So for something like this:
case class RandomClass(stringOne: String, stringTwo: String, numericOne: Int)
val ds = Seq(
RandomClass("a", null, 1),
RandomClass("a", "x", 3),
RandomClass("a", "y", 4),
RandomClass("a", null, 5)
).toDS()
ds.printSchema()
results in
root
|-- stringOne: string (nullable = true)
|-- stringTwo: string (nullable = true)
|-- numericOne: integer (nullable = false)
why would stringOne
be nullable?
Strangely, numericOne
is inferred correctly. I assume I am just missing something about the relationship between Dataset and DataFrame API?