1

I am using pyspark in Azure Databricks. I had attempted to write a delta table with null column created as follows:

df = df.withColumn('val2', funcs.lit(None))

using the following function

def write_to_delta_table(df, fnm, tnm, path):
  df.createOrReplaceTempView(fnm) 
  spark.sql(f'''select *
  from {fnm}
  ''').repartition(1).write.format('delta').mode('overwrite').option('overwriteSchema', 'true').save(f'{path}/{fnm}')
  spark.sql(f'''DROP TABLE IF EXISTS {tnm}; ''')
  spark.sql(f'''CREATE TABLE {tnm}
    USING DELTA
    LOCATION  '{path}/{fnm}'
    ''')

I got an error and realized that I needed to cast the null column as its intended type

df = df.withColumn('val2', funcs.lit(None).cast(BooleanType())

However, even after doing that I am unable to write the dataframe using the function above. Its showing an error such as

org.apache.spark.SparkException: Cannot recognize hive type string: void, ...

for that column in the drop table call.

--> spark.sql(f'''DROP TABLE IF EXISTS {tnm}; ''')

I see this error even if I physically delete the parquet file from blob storage. I cannot seem to be able to delete/overwrite this delta table. I see this error even if I remove the offending column and try to write it since it fails to delete it. Note that a version of the table already existed with a different schema.

GreenEye
  • 153
  • 2
  • 14

1 Answers1

0

It turns out that shutting down and restarting the cluster made this problem vanish. I can now overwrite the table.

GreenEye
  • 153
  • 2
  • 14