This option works almost like a dynamic overwrite partition, basically you are telling Spark to overwrite only the data that is on those range partitions. In addition, data will be saved only if your dataframe matches the condition replaceWhere
, otherwise, if a single row does not match, an exception Data written out does not match replaceWhere
will be thrown.
Q: Would this cause a deletion of 900 records?
A: Yes, it would delete.
I did a test creating one dataframe with 2 columns
root
|-- number: long (nullable = true)
|-- even: integer (nullable = true)
The first run will save 1000 rows, where 500 are even and 500 are odd:
rows = [Row(number=i) for i in range(0, 1000)]
df = spark.createDataFrame(rows)
df = df.withColumn('even', (f.col('number') % 2 == f.lit(0)).cast('int'))
(df
.write
.partitionBy('even')
.format('delta')
.saveAsTable('my_delta_table'))

The second run will filter only even rows and overwrite partition where even=1
:
rows = [Row(number=i) for i in range(0, 10)]
df_only_even = spark.createDataFrame(rows)
df_only_even = df_only_even.withColumn('even', (f.col('number') % 2 == f.lit(0)).cast('int'))
# It is required to filter your dataframe or will throw an error during write operation
df_only_even = df_only_even.where(f.col('even') == f.lit(1))
(df_only_even
.write
.partitionBy('even')
.format('delta')
.option('replaceWhere', 'even == 1')
.mode('overwrite')
.saveAsTable('my_delta_table'))

Result
My table named my_delta_table
has 505 rows, where 500 are odd and 5 are even:
