Deltalake: replaceWhere not working for date format

Question

my use case is that I want to partition my table on date. On different dates the rows will be appended but if the code is rerun on the same date then it should be overwritten

After looking online it seemed like this task can be done using deltalake's replacewhere feature but I am fine with any solution that involves parquet

I have the following code:

from datetime import date
from pyspark.sql import SparkSession
from pyspark.sql.types import DateType
from pyspark.sql.types import StringType
from pyspark.sql.types import StructField
from pyspark.sql.types import StructType

data = [(date(2022, 6, 19),  "Hello"), (date(2022, 6, 19), "World")]

schema = StructType([StructField("date", DateType()),StructField("message", StringType())])

df = spark.createDataFrame(data, schema=schema)
df.write.partitionBy("date").option("replaceWhere", f"date = '2022-06-19'").save(f"/tmp/test", mode="overwrite", format='delta')
df.write.partitionBy("date").option("replaceWhere", f"date = '2022-06-19'").save(f"/tmp/test_3", mode="overwrite", format='delta')

At the second write call the code throws the following exception:

pyspark.sql.utils.AnalysisException: Data written out does not match replaceWhere 'date = '2022-06-19''.
CHECK constraint EXPRESSION(('date = 2022-06-19)) (date = '2022-06-19') violated by row with values:
 - date : 17337

Have you made sure `spark.sql.sources.partitionOverwriteMode` is set to dynamic? As the default is static.. Reference: https://medium.com/nmc-techblog/spark-dynamic-partition-inserts-part-1-5b66a145974f — teedak8s, Jun 20 '22 at 01:31
possible duplicate here: https://stackoverflow.com/questions/59851167/spark-delta-overwrite-a-specific-partition/65305467#65305467 — Ali Hasan, Jun 20 '22 at 08:10

score 0 · Answer 1 · answered Jul 04 '23 at 18:06

This issue generally comes because the partition column you have is not having the same value as the partition you want to replace with,

Here, your issue might be, you have your partition column as date format, maybe, if you try with string, it should work fine.

Deltalake: replaceWhere not working for date format

1 Answers1