0

Am trying to write dataframe as .delta format but getting 'AnalysisExcpetion'

code:

df = spark.read.format("csv").option("header", True).option("inferSchema",True).load('dbfs:/mnt/raw/Sales.csv')
df.write.format("delta").mode("overwrite").save('/mnt/bronze/sales/')

Exception:

---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
<command-3501976434710565> in <module>
      1 df = spark.read.format("csv").option("header", True).option("inferSchema",True).load('dbfs:/mnt/raw/Sales.csv')
----> 2 df.write.format("delta").mode("overwrite").save('/mnt/bronze/sales/')

/databricks/spark/python/pyspark/sql/readwriter.py in save(self, path, format, mode, partitionBy, **options)
    738             self._jwrite.save()
    739         else:
--> 740             self._jwrite.save(path)
    741 
    742     @since(1.4)

/databricks/spark/python/lib/py4j-0.10.9.1-src.zip/py4j/java_gateway.py in __call__(self, *args)
   1302 
   1303         answer = self.gateway_client.send_command(command)
-> 1304         return_value = get_return_value(
   1305             answer, self.gateway_client, self.target_id, self.name)
   1306 

/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)

** can write as 'csv' getting error when format is 'delta'

any clues?

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Sreedhar
  • 29,307
  • 34
  • 118
  • 188

2 Answers2

0

I tried reproducing this and got the same error AnalysisException. The following is an image of the same.

enter image description here

I solved it using the following procedure. Here when we are writing the dataframe to a directory (sales):

  • When we use format as csv it directly writes the part files to this directory. Therefore, it does not throw any error. Refer the following image.
df.write.format("csv").mode("overwrite").save('/mnt/data/bronze/sales/')

enter image description here

  • But when we write to directory using delta format, the write is checking for a delta transactional log. Since it is absent in the directory, it is throwing error. So, instead of writing directly to sales directory, choose a name for your delta file and then save.
df.write.format("delta").mode("overwrite").save('/mnt/data/bronze/sales/demo_delta_table')

enter image description here

  • This creates a folder which consists of the parquet data file along with delta log. You can proceed to use this path for further processing.

enter image description here

  • So, replace your code with the one given below:
df.write.format("delta").mode("overwrite").save('/mnt/bronze/sales/<delta_name>')

#if sales it the name of your delta, then remove leading /
#df.write.format("delta").mode("overwrite").save('/mnt/bronze/sales')
Saideep Arikontham
  • 5,558
  • 2
  • 3
  • 11
0

Delta files don't respond well to spaces and special characters (apart from dash - and underscore _) in the column headers, for some reason.

A quick solution when using pandas is to replace potential spaces in your column headers with underscore:

df.columns = [col.replace(' ', '_') for col in df.columns]

If you have other special characters in your headers (e.g. (), [], {} etc. then these must be removed as well.