It seams you are doing an incremental update. There is a special step, named Merge Rows (Diff)
, to compare two streams and tell if they exists in both streams and if they have changed.
The two streams, a reference stream (the current data) and a compare stream (the new data), are merged. The row are merged and marked as :
- identical The key was found in both streams and the values to compare are identical;
- changed The key was found in both streams but one or more values is different;
- new The key was not found in the reference stream;
- deleted The key was not found in the compare stream.
The two streams must be sorted before to be merged. You can do this in the sql query, but it is best to put an explicit Sort row
step, because otherwise the process will stop after reading 1000 records, or whatever is in the Rowset limit (seams familiar?).
The stream is then directed with a Swich/Case
step to the appropriate action.
The "best pratice" pattern is as follows, in which I added a first step to get the max date, and a step to built the surrogate key.

This pattern has been proven since years as much faster. In facts, it avoids the very slow Database lookup
(one db full search by row) and reduce the use of the slow Insert/Update
step (3 access to the db: one to fetch the record, one to change the values and one to store it). And the sort (which can be pre-prepared in the stream) is pretty efficient.