I have the following delta table
+-+----+
|A|B |
+-+----+
|1|10 |
|1|null|
|2|20 |
|2|null|
+-+----+
I want to fill the null values in column B based on the A column.
I figured this to do so:
var df = spark.sql("select * from MyDeltaTable")
val w = Window.partitionBy("A")
df = df.withColumn("B", last("B", true).over(w))
Which gives me the desired output:
+-+----+
|A|B |
+-+----+
|1|10 |
|1|10 |
|2|20 |
|2|20 |
+-+----+
Now, my question is:
What is the best way to write the result in my delta table correctly ?
Should I merge ? Re-write with overwrite option ?
My delta table us huge and it will keep on increasing, I am looking for the best possible method to achieve so.
Thank you