0

I want to do two types of duplicate checking

  1. If we already have loaded A file With That name previously.

For instance, file A is loaded into the target table, and subsequent run, if we receive the file A, this time sequence should be aborted because it's already loaded.

  1. If we have already loaded a with the identical records

For instance, file A is already in the target table, and next time we receive file B in this file B, those already loaded in the target with file A should not be loaded, and the job should be aborted

Can anyone help me with this scenario?

Thanks Venkat.

Krishna
  • 25
  • 4

1 Answers1

0

You need to keep records of which file names have been loaded, typically by having moved the file to an archive (or "processed") directory. So you can use a simple ls command with this file name to determine whether it exists, to solve your first requirement. Determining whether file B has identical records to file A is a more complex question. Can you use a diff command? Otherwise you may need to do something cleverer. Even before that, how do you establish that file A is the one against which you have to compare? If there are key values, you may be able to check against the target table.