I am using pyspark in Azure Databricks. I had attempted to write a delta table with null column created as follows:
df = df.withColumn('val2', funcs.lit(None))
using the following function
def write_to_delta_table(df, fnm, tnm, path):
df.createOrReplaceTempView(fnm)
spark.sql(f'''select *
from {fnm}
''').repartition(1).write.format('delta').mode('overwrite').option('overwriteSchema', 'true').save(f'{path}/{fnm}')
spark.sql(f'''DROP TABLE IF EXISTS {tnm}; ''')
spark.sql(f'''CREATE TABLE {tnm}
USING DELTA
LOCATION '{path}/{fnm}'
''')
I got an error and realized that I needed to cast the null column as its intended type
df = df.withColumn('val2', funcs.lit(None).cast(BooleanType())
However, even after doing that I am unable to write the dataframe using the function above. Its showing an error such as
org.apache.spark.SparkException: Cannot recognize hive type string: void, ...
for that column in the drop table call.
--> spark.sql(f'''DROP TABLE IF EXISTS {tnm}; ''')
I see this error even if I physically delete the parquet file from blob storage. I cannot seem to be able to delete/overwrite this delta table. I see this error even if I remove the offending column and try to write it since it fails to delete it. Note that a version of the table already existed with a different schema.