Databricks is treating NULL as string

Question

I am using Databricks Unity Catalog, and I have a requirement to upload a CSV file, process it, and load it into a final table. However, when uploading the file in Databricks, it converts NULL data to the string 'NULL', which is causing an issue. Do you have any ideas on how I can resolve this problem?

When I am uploading a csv file databricks is treating Null values as 'NULL' Strings which is causing issue. Lets say this is a date field and I need to convert it then it is failing because of 'NULL' string. — SK ASIF ALI, Jul 03 '23 at 13:37

score 1 · Answer 1 · answered Jul 03 '23 at 14:15

1

CSV files by definition doesn't have any way to specify null values - everything is treated as a string. If you have some placeholder value inside your CSV, then you can pass the nullValue parameter when reading the CSV data to specify what strings would be treated as nulls (see doc):

df = spark.read.csv(path, nullValue="null")

or specify it as option:

df = spark.read.format("csv") \
  .option("nullValue", "null")
  .load(path)

answered Jul 03 '23 at 14:15

Alex Ott

80,552
8
87
132

Thanks @Alex, but I am uploading the data and creating a delta table out of it. I am not directly reading from the file. – SK ASIF ALI Jul 03 '23 at 15:34

Databricks is treating NULL as string

1 Answers1