I have a target table has 100 rows and an incremental table has 20 rows (updates)
When performing merge with whenMatchedUpdate using pyspark, in target table 20 rows is getting updated and remaining 80 rows are getting updates as null values , not sure where it is going wrong .
Target Table
+---------------+--------------+
|center_nbr|center_id|active_ind|
+-------------------------+-----
| 12345| 6559 | Y|
| 56789| 14257 | Y|
| 77777| 2218 | Y|
| 88888| 2218 | Y|
+---------------+-------------
Incremental Table
+---------------+--------------+
|center_nbr|center_id|active_ind|
+-------------------------+-----
| 12345| 6559 | N|
| 56789| 14257 | N|
| 77777| 2218 | Y|
| 88888| 2218 | Y|
+---------------+-------------
Final Target table after merge
+---------------+--------------+
|center_nbr|center_id|active_ind|
+-------------------------+-----
| 12345| 6559 | N |
| 56789| 14257 | N |
| null | null | null |
| null | null | null |
+---------------+-------------
Merge code:
df_tgt.alias("tgt").merge(
df_incr.alias("up"),
condition="tgt.center_nbr=up.center_nbr)\
.whenMatchedUpdate(set ={"tgt.active_ind": "up.active_ind"}).execute()