0

I am working with SSIS 2014. I have several dozens of millions of rows coming from Attunity Oracle Source component, which its source query has 20 columns and it is time consuming. For some reason, I need to insert 2 columns of the same query in another table which has reference to the first table.

I would like to avoid a second calling to the query in another DataFlowTask, so I was thinking about creating a WAIT till x_01 finishes Script Component task, which will validate when the x_01 OleDB Destination finishes, for then proceed with inserting in the second table.

Kindly understand what I want to achieve here: here

I was thinking about using a x_01_Finished Package Scope variable and changing its value when PostExecute event is fired from x_01, but unfortunately the PostExecute is not available at x_01 OleDB Destination level.

Question 1:

Is there a way to get the status of some other component, inside a Script Component code?

Something like the following:

public override void PreExecute() {
    base.PreExecute();
    int sec = 5; //number of seconds to check the status of x_01 Component
    int ms  = sec * 1000;
    bool lbo_x01_Finished = false;
    do {
        System.Threading.Thread.Sleep(ms);
        lbo_x01_Finished = IsComponent_Finished("x01"); //returns true when x01 component finished
    } while(!lbo_x01_Finished);
}

Question 2:

Do you think the package performance will be highly affected using this tecnique?

UPDATE 1

Already tried the Nick.McDermaid's suggestion with a really tiny amount of data. I created 2 temporary tables, one with the PK in the first column and the second table with a FK to this PK of the first table. Even with this small amount of data I got this error: The INSERT statement conflicted with the FOREIGN KEY constraint "FK_tmp_1_tmp_2". You can see the in the following screenshot that insert indeed is really fast, and in the second chunk of data, the x_02 object intended to insert before the first one (and it was not possible due to this FK).

So, my next idea is sleeping some miliseconds while reading just the first row before inserting on the second table. We will see if this can helps.

screenshot

OscarSosa
  • 89
  • 1
  • 1
  • 6
  • I think you're overcomplicating it. Why not use a multicast and feed them in in parallel? Or use an insert/select afterwards – Nick.Mc Jan 24 '18 at 10:27
  • I want to avoid the select/insert afterwards for _saving time_ of the whole migration, I have the data on memory already, so... Regarding the multicast, it is already there but it is needed to insert in the first table and then insert in the second one. – OscarSosa Jan 24 '18 at 10:35
  • You might be suprised how fast the insert/select works. – Nick.Mc Jan 24 '18 at 12:44
  • Yeah, I am. For now, I'm using the the select/insert in the next Data Flow Task for long time ago. Just lately, I am wondering if it is possible such idea. Time to time I need to migrate to Azure databases, and they are way slower than our internal environments. Another issue is the quality of the hard disks we have with another customers. E.g. For a similar example, there are times when we are able to insert 20 million of rows in 20 min, and 9 million of rows took 45 min in another env. Thanks anyway for your recommendations, Nick =] – OscarSosa Jan 24 '18 at 12:59
  • And I will not only _save time_ for the whole migration, I will also save the workload of the second select in the Oracle DB. – OscarSosa Jan 26 '18 at 08:35
  • The buffers in the data flow load and throwaway data. You never have the full set of data in memory - it feeds in and out of buffers. When x_01 is finished the only data you have in memory is the last batch of rows that fit into the destination buffer. You (or may not) see some performance increase if you do whatever is in those derived columns in the source query. Unless of course that info is only available in the package. – Nick.Mc Jan 26 '18 at 10:40
  • 1
    You could use some kind of partially blocking transformation (or make your own in a script transformation) to delay the second set of data. Or you could disable FK's while you load. (that speeds things up also and is pretty standard practice). I suggest disabling FK's. Having said that, it's strange to have an FK relationship between two apparently identical sets of data – Nick.Mc Jan 26 '18 at 10:43
  • Thanks for this last comment! It is really useful, indeed! =] – OscarSosa Jan 26 '18 at 10:45

0 Answers0