I am excuting delta Lake function on aws. However, I am not getting the correct result. below is the pyspark script. It ran successfully. However, the output contains less records than the origianl table.
deltaTable.alias("old")\
.merge(df.alias("new"),join_string)\
.whenMatchedUpdateAll() \
.whenNotMatchedInsertAll() \
.execute()
As the below image show, numOutputrows should be ~226k . however, i only get 21k in the final result.
~226k records in the output table.