Getting AnalysisException on Databricks while create delta format

Question

Am trying to write dataframe as .delta format but getting 'AnalysisExcpetion'

code:

df = spark.read.format("csv").option("header", True).option("inferSchema",True).load('dbfs:/mnt/raw/Sales.csv')
df.write.format("delta").mode("overwrite").save('/mnt/bronze/sales/')

Exception:

---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
<command-3501976434710565> in <module>
      1 df = spark.read.format("csv").option("header", True).option("inferSchema",True).load('dbfs:/mnt/raw/Sales.csv')
----> 2 df.write.format("delta").mode("overwrite").save('/mnt/bronze/sales/')

/databricks/spark/python/pyspark/sql/readwriter.py in save(self, path, format, mode, partitionBy, **options)
    738             self._jwrite.save()
    739         else:
--> 740             self._jwrite.save(path)
    741 
    742     @since(1.4)

/databricks/spark/python/lib/py4j-0.10.9.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1302 
   1303         answer = self.gateway_client.send_command(command)
-> 1304         return_value = get_return_value(
   1305             answer, self.gateway_client, self.target_id, self.name)
   1306 

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)

** can write as 'csv' getting error when format is 'delta'

any clues?

Saideep Arikontham · Answer 1 · 2022-07-13T03:17:19.997

I tried reproducing this and got the same error AnalysisException. The following is an image of the same.

I solved it using the following procedure. Here when we are writing the dataframe to a directory (sales):

When we use format as csv it directly writes the part files to this directory. Therefore, it does not throw any error. Refer the following image.

df.write.format("csv").mode("overwrite").save('/mnt/data/bronze/sales/')

But when we write to directory using delta format, the write is checking for a delta transactional log. Since it is absent in the directory, it is throwing error. So, instead of writing directly to sales directory, choose a name for your delta file and then save.

df.write.format("delta").mode("overwrite").save('/mnt/data/bronze/sales/demo_delta_table')

This creates a folder which consists of the parquet data file along with delta log. You can proceed to use this path for further processing.

So, replace your code with the one given below:

df.write.format("delta").mode("overwrite").save('/mnt/bronze/sales/<delta_name>')

#if sales it the name of your delta, then remove leading /
#df.write.format("delta").mode("overwrite").save('/mnt/bronze/sales')

score 0 · Answer 2 · answered Oct 14 '22 at 08:22

Delta files don't respond well to spaces and special characters (apart from dash - and underscore _) in the column headers, for some reason.

A quick solution when using pandas is to replace potential spaces in your column headers with underscore:

df.columns = [col.replace(' ', '_') for col in df.columns]

If you have other special characters in your headers (e.g. (), [], {} etc. then these must be removed as well.

Getting AnalysisException on Databricks while create delta format

2 Answers2