0

For instance File A Loaded then next day File B Loaded then next day This time Again, File A received this time sequence should be abort

Can anyone help me out with this

Thanks

Krishna
  • 25
  • 4

1 Answers1

0

There are multiple ways to solve this, but please don't do intentionally aborts as they're most likely boomerangs.

  • Keep track of filenames and file hashes (like MD5sum) in a table and compare the list before loading. If the file is known, handle/ignore it.
  • Just read the file again as if it was new or updated. Compare old data with new data using the Change Capture stage, handle data as needed, e.g. write changed and new data to target. (recommended)

I would not recommend writing a sequence that "should abort" as this is not the goal of an ETL process. If the file contains the very same content that is already known, just ignore it. If it has updated data, handle it as needed. Only abort, if there is a technical issue, e.g. the file given is wrong formatted. An abort of a job should indicate that something is wrong with the job. When you get a file twice, then it's not the job that failed.

If an error was found in the data that needs to be fixed by others, write the information about it to a table. Have a another independend process monitoring that table to tell the data producer about it (via dashboard, email,...).

Justus Kenklies
  • 440
  • 3
  • 10