I am using Databricks Unity Catalog, and I have a requirement to upload a CSV file, process it, and load it into a final table. However, when uploading the file in Databricks, it converts NULL data to the string 'NULL', which is causing an issue. Do you have any ideas on how I can resolve this problem?
Asked
Active
Viewed 61 times
0
-
what is the issue you are facing? – JayashankarGS Jul 03 '23 at 13:28
-
When I am uploading a csv file databricks is treating Null values as 'NULL' Strings which is causing issue. Lets say this is a date field and I need to convert it then it is failing because of 'NULL' string. – SK ASIF ALI Jul 03 '23 at 13:37
-
ok.where you are using databricks. azure,aws or community? – JayashankarGS Jul 03 '23 at 13:38
-
I am using aws databricks – SK ASIF ALI Jul 03 '23 at 13:39
1 Answers
1
CSV files by definition doesn't have any way to specify null
values - everything is treated as a string. If you have some placeholder value inside your CSV, then you can pass the nullValue
parameter when reading the CSV data to specify what strings would be treated as nulls (see doc):
df = spark.read.csv(path, nullValue="null")
or specify it as option:
df = spark.read.format("csv") \
.option("nullValue", "null")
.load(path)

Alex Ott
- 80,552
- 8
- 87
- 132
-
Thanks @Alex, but I am uploading the data and creating a delta table out of it. I am not directly reading from the file. – SK ASIF ALI Jul 03 '23 at 15:34