SSIS Data Flow performance slow vs select into

Question

so I have a script:

select * 
into my_archive_05302013
from archive_A
where my_Date = '05/18/2013'

and:

insert into archive_B (ID,my_date,field_n )
select ID,  my_Date,field_n from my_archive_05302013

where the n in field_n is about 100 or so. so in other words there are more than 100 columns in the table that I am loading.

which run pretty fast the query inserts about 200000 records. my_date is a non-clustered index in table archive_A

Now when I create a dataflow using SSIS 2008 it takes HOURS to complete

I have the following in my OLE DB source:

SELECT * FROM Archive_A
WHERE My_Date = (SELECT MAX(My_Date) from Archive_A)

and for OLE DB Destination:

Data access mode of: "Table or view - fast load" Name of the table: archive_B Table lock and Check constraints is checked

Any one know what the problem could be?

Thanks in advance

First off, you're not doing the same thing. Your first query is going to a specific date in your archive table and writing that into another table. You then pull 3 columns from that much smaller table into the final destination. Contrast that with your SSIS approach wherein you're grabbing every field on your archive table, which if it's more than those 3 columns, you're going to have to read a lot more data than is required. You'll also have to compute the maximum date from the archive table vs a fixed point in time. Finally, the first does not have to factor in network latency, second might. — billinkc, May 30 '13 at 14:57
Hmm, Data Flow should beat SELECT INTO? You could use an Execute SQL Task to obtain the MAX(My_date) and store it in a variable, and then use that variable in the OLE DB Source in the Data Flow. (Incidentally, your title should read "SSIS Data Flow..." instead of "SSIS Task Flow...".) That just separates out the subquery. The other thing would be to check the metadata on the arrow from the Source to the Destination. Do all the 100 columns have the correct data types? You can also add a [Data Viewer](http://msdn.microsoft.com/en-us/library/ms140318.aspx) and watch the data as it goes by. — criticalfix, May 30 '13 at 15:03
I thought you were transferring all 100+ columns in both cases, and you were just showing an abbreviated version of the script. If you only need a subset of the columns, then you definitely need to tell SSIS about that before it goes and moves gigabytes ... — criticalfix, May 30 '13 at 15:06
yes I am showing an abbreviated version of the script and I have also tried seperating out getting the max date using a sql task and storing in a variable no difference — user2129585, May 30 '13 at 15:11
@billinkc network latency? interesting that may be the problem as my source and destination databases are on seperater servers — user2129585, May 30 '13 at 15:24

score 1 · Accepted Answer · answered May 30 '13 at 15:01

1

The problem is that because you are using a data source and a data destination what you are doing is pulling all of the data out of the database to then put it all back in again whereas your INSERT statement keeps it all contained within the database. Use an Excute SQL Task with your INSERT statement instead.

answered May 30 '13 at 15:01

Dave Sexton

10,768
3
42
56

The problem is in the package he's not limiting the record set where in SQL he is. – Zane May 30 '13 at 15:09
@davesexton I am using a data flow task because I am using the inserted row count task for auditing purposes – user2129585 May 30 '13 at 15:13
Or you could do a SELECT COUNT() after the SELECT INTO, which should accomplish the same for auditing purposes. – criticalfix May 30 '13 at 15:21
@criticalfix true I can work around the problem I just wanted to understand what the reason for the huge difference in processing time is – user2129585 May 30 '13 at 15:38
To get the row count add a variable, change the Execute SQL Task to be single row, in the Result Set map your variable to ROWCOUNT and then after your INSERT in the SQL Statement add SELECT @@ROWCOUNT AS ROWCOUNT. Job done. – Dave Sexton May 30 '13 at 16:04
thanks Dave still dont fully understand why its different but as theres a work around marking the answer as correct – user2129585 May 30 '13 at 20:18

SSIS Data Flow performance slow vs select into

1 Answers1