How to duplicate records check between the source file and target table in Datastage

Question

I want to do two types of duplicate checking

If we already have loaded A file With That name previously.

For instance, file A is loaded into the target table, and subsequent run, if we receive the file A, this time sequence should be aborted because it's already loaded.

If we have already loaded a with the identical records

For instance, file A is already in the target table, and next time we receive file B in this file B, those already loaded in the target with file A should not be loaded, and the job should be aborted

Can anyone help me with this scenario?

Thanks Venkat.

score 0 · Answer 1 · answered Jun 30 '22 at 00:00

You need to keep records of which file names have been loaded, typically by having moved the file to an archive (or "processed") directory. So you can use a simple ls command with this file name to determine whether it exists, to solve your first requirement. Determining whether file B has identical records to file A is a more complex question. Can you use a diff command? Otherwise you may need to do something cleverer. Even before that, how do you establish that file A is the one against which you have to compare? If there are key values, you may be able to check against the target table.

How to duplicate records check between the source file and target table in Datastage

1 Answers1