0

I have a simple ETL job copying data from MS SQL to DB2 using DataStage. I need to update a column in MS SQL, "SenttoDB2" once I have successfully copied the data to DB2.

I figured that I just need to create another stage after DB2 and pass the "key" from the source in the update SQL to update the column. Is this correct or am I missing a step somewhere?

  • Seems to me you need to "mark" when extracted, not when updated. With that approach, after you know the update is successful you run a trivial query to update everything that was marked - no need to know a potentially giant set of key values. – SMor Feb 02 '21 at 11:51
  • How do you "mark"? what stage type to I use? I figured I needed to know that the transfer to DB2 is successful before I can "update" the flag in the source DB. –  Feb 02 '21 at 13:42
  • 1
    Are you using a server jobs? In general I would recommend to use another strategy updating the source is some extra effort that is not necessarily needed for a reliable delta strategy. – MichaelTiefenbacher Feb 02 '21 at 19:51
  • @MichaelTiefenbacher, I am using Parallel Job. –  Feb 03 '21 at 04:23
  • One more question I have is how to I do update and insert in the same stage? I may have update records that may need to updated and not necessarily insert. –  Feb 03 '21 at 04:24
  • Use 2 sequential jobs. The 1-st one does MSSQL -> DB2 inserts / updates. The 2-nd one MSSQL -> MSSQL updates. You may either query MSSQL twice with the same SELECT, if you are sure, that the source set of rows is the same our use a DataSet to store the IDs selected in the 1-st job. The reason of 2 jobs use is, that rows are processed in batches usually, and it's hard to process possible errors with just a single job. You may turn off batch processing, but I'd not recommend doing so. – Mark Barinstein Feb 03 '21 at 08:09

1 Answers1

0

You could add an after stage update sql to the source DB2 stage. The sql will get written when the data is pulled, but it will be rolled back on job failure.

If the timing is exceptionally important then you will need to create a second job that updates the source table after the job completes.

  • I will go with the after stage approach. It means that if the insert to Db2 stage fail, the job will stop anyway and will not update the MS SQL database. –  Feb 05 '21 at 01:21