Let's say I've got a simple pipe delimited file, with missing values:
A|B||D
I read that into a dataframe:
val foo = spark.read.format("csv").option("delimiter","|").load("/path/to/my/file.txt")
The missing third column, instead of being a null value, has the string null:
+-----------+----+
|_c0|_c1| _c2|_c3|
+-----------+----+
|a |b |null|c |
I would like to be able to control what Spark does with that missing value. According to the CSV Documentation, there is a nullValue
option:
nullValue: string that indicates a null value, any fields matching this string will be set as nulls in the DataFrame.
But since I don't have any value there, I can't seem to control it that way.
Short of testing every single field of every single file I read in, is there way to control what spark does with these blanks? For example, replace it with 'N/A'.
We're using Spark 2.1, if it's relevant.