Delta Merge Operation not always inserts/updates all of the records

Question

This happens from time to time - this is the strange part My current Solution :

Re-run the job ! :disappointed: - but this is very reactive - not happy3

This is how my merge stmt look like:

MERGE INTO target_tbl AS Target USING df_source AS Source ON Source.key = Target.key
WHEN MATCHED
AND Target.ctl_utc_dts = '9999-12-31'
AND Target.ctl_hash = Source.ctl_hash
AND Source.ctl_start_utc_dts < Target.ctl_start_utc_dts THEN
UPDATE
SET
  Target.ctl_start_utc_dts = Source.ctl_start_utc_dts,
  Target.ctl_updated_run_id = Source.ctl_updated_run_id,
  Target.ctl_modified_utc_dts = Source.ctl_modified_utc_dts,
  Target.ctl_updated_batch_id = Source.ctl_updated_batch_id
  WHEN MATCHED
  AND Target.ctl_utc_dts = '9999-12-31'
  AND Target.ctl_hash != Source.ctl_hash THEN
UPDATE
SET
  Target.ctl_utc_dts = Source.ctl_start_utc_dts,
  Target.ctl_updated_run_id = Source.ctl_updated_run_id,
  Target.ctl_modified_utc_dts = Source.ctl_modified_utc_dts,
  Target.ctl_updated_batch_id = Source.ctl_updated_batch_id
  WHEN NOT MATCHED THEN
    INSERT(columns......)
    VALUES(columns......)

Spark App configuration:

--conf spark.yarn.stagingDir=hdfs://$(hostname -f):8020/user/hadoop 
--conf spark.yarn.appMasterEnv.SPARK_HOME=/usr/lib/spark 
--conf spark.yarn.submit.waitAppCompletion=true 
--conf spark.yarn.maxAppAttempts=5 
--conf yarn.resourcemanager.am.max-attempts=5 
--conf spark.shuffle.service.enabled=true 
--executor-memory 24G 
--driver-memory 60G 
--driver-cores 6 
--executor-cores 4 
--conf spark.executor.asyncEagerFileSystemInit.paths=s3://s3_bkt
--conf spark.dynamicAllocation.maxExecutors=24 
--packages io.delta:delta-core_2.12:1.0.0

Running on AWS EMR

Release label : emr-6.4.0
Hadoop distribution : Amazon 3.2.1
Applications: Spark 3.1.2

Any pointers of things i need to do different ? any help or ideas would be great ! Thx all

Have you read https://stackoverflow.com/q/41469327/1305344 ? I think it could be somehow related to `=` operator that perhaps should be `<=>`? — Jacek Laskowski, Sep 12 '22 at 10:01

Delta Merge Operation not always inserts/updates all of the records

0 Answers0