3

I have 2 tables. Table 1: Source that is in Oracle cloud and Table 2: in sql server 2016 on prem server. We have a request to update the on-prem data with cloud data based on some joining column(s). Question is how do i update via open query on prem data without landing the cloud data into my on-prem server. In test case I have few thousand rows but in reality there are close to 44M rows in cloud source. That said tech example and what I have tried thus far.

  SELECT * 
  into #On_PremTable
  select *
  FROM OPENQUERY([111.222.555.999/PXX],'select *
 from  trac where rownum <= 500');

this works.

 update #On_PremTable
set ARTCEXR = c.ARTCEXR
,ARTGEST = c.ARTGEST
FROM OPENQUERY([111.222.555.999/PXX],'select *
 from  trac where rownum<= 500 ') as c
 join  #On_PremTable as a on a.ARTCINR = c.ARTCINR

also works. But how to I check or update rows on the #On_PremTable when they are few thousand or even Millions. Is there a way for me to put #On_PremTable part of open query so this join , in lack of words happens on the cloud side, and only update or pull back a small result-set to update rather than few thousand or 44M records.

junketsu
  • 533
  • 5
  • 17
  • 1
    Some questions: does the data only get changed in the remote server (Oracle)? If so, is there a way to determine which rows have changed? I'm asking because if you pull (do the join on the sql on prem table) or push (do the join on the Oracle db) it will still be a huge burden for the amount of rows. It has to push or pull ALL the data to make a join, so if there is a way to filter only changed rows on either side, that might help. – LaraRaraBoBara Aug 07 '19 at 17:35
  • 1
    hi @FembotDBA ty for suggestion. Sadly I have NOT been able to obtain such information like Change Data Capture, row version column, start stop dates for records, timestamp, etc. – junketsu Aug 07 '19 at 17:49
  • 1
    Understand. From what it looks like in your 2nd query, you would need three columns from the Oracle table, so if I were limited to your options, I would bring over all 44M rows into a "real" table (not temp), only the three columns I need. Perhaps doing a select into statement and recreate the table each time, add on an index to ARTCINR in the new table, then update locally. If you try to do a join on remote data ALL data will have to be pushed or pulled anyway. Certainly don't do a select * as in your first, unless the table ONLY has the three columns. – LaraRaraBoBara Aug 07 '19 at 18:04
  • 1
    there are about 25-30 columns. So ofcouse I will limit the ones I bring in (all or certain ones depending on requirement). Again ty for your suggestion that this is pretty much a pull of data from cloud into a persisted table on prem. And then doing an update via this pulled_cloud_data_Table to ON_Prem table. – junketsu Aug 07 '19 at 18:07
  • You can try to use SSIS "fastload mode" to pull the cloud table with only the columns needed, it should load the table very fast. – Adam Yan Aug 07 '19 at 18:47
  • @AdamYan seems that is the route to go ; since it has NOT been shared w/ me any method of change tracking of specific rows where data is updated. NOT an optimized solution but we work with params of both system(s) provided to us. Thank you. – junketsu Aug 07 '19 at 18:49
  • 1
    If you want to minimize the on premise server writing, check this for how to upsert with SSIS: http://azkhairuzzaman.blogspot.com/2015/09/ssis-perform-upsert-updateinsert-using.html – Adam Yan Aug 07 '19 at 18:57

0 Answers0