8

I am new to SSIS and have a pair of questions

  1. I want to transfer 1,25,000 rows from one table to another in the same database. But When I use Data Flow Task, it is taking too much time. I tried using an ADO NET Destination as well as an OLE DB Destination but the performance was unacceptable. When I wrote the equivalent query inside an Execute SQL Task it provided acceptable performance. Why is such a difference in performance.

    INSERT INTO table1 select * from table2

  2. Based on the first observation, I changed my package. It is exclusively composed of Execute SQL Tasks either with a direct query or with a stored procedure. If I can solve my problem using only the Execute SQL Task, then why would one use SSIS as so many documents and articles indicate. I have seen as it's reliable, easy to maintain and comparatively fast.

Hadi
  • 36,233
  • 13
  • 65
  • 124
Teju MB
  • 1,333
  • 5
  • 20
  • 37
  • Have you tried breaking the transfer into blocks, e.g. 10K rows at a time? Logging can be traumatic for large single operations. – HABO Apr 24 '13 at 02:38
  • while using the dataflow task, it itself breaks the row amount into 10k partition and then insert, is this the cause to slow down? – Teju MB Apr 24 '13 at 02:59
  • 3
    Did you select 'fast load' in the oledb destination? – Maximus Apr 24 '13 at 04:54
  • yeah i have used the fast load also. But after reading the answer posted by billinkc, i let it remain with all of Execute SQL task and run it through the SQL Agent. I am now satisfied ! – Teju MB Apr 24 '13 at 06:00
  • While this is an old thread, I recently ran into this. I had 'Lookup' tasks that executed poorly on two systems versus two others. Likely one of the reasons outlined below but I solved it by refactoring all of the lookups into SQL Tasks that used a combination of JOINS and CTE. I shaved 30+ hours for the first bulk run down to 6 hours. Huge win. Only where the data flow added columns did I leave those in place. – D-Klotz Nov 19 '21 at 14:31

2 Answers2

29

Difference in performance

There are many things that could cause the performance of a "straight" data flow task and the equivalent Execute SQL Task.

  1. Network latency. You are performing insert into table a from table b on the same server and instance. In an Execute SQL Task, that work would be performed entirely on the same machine. I could run a package on server B that queries 1.25M rows from server A which will then be streamed over the network to server B. That data will then be streamed back to server A for the corresponding INSERT operation. If you have a poor network, wide data-especially binary types, or simply great distance between servers (server A is in the US, server B is in the India) there will be poor performance
  2. Memory starvation. Assuming the package executes on the same server as the target/source database, it can still be slow as the Data Flow Task is an in-memory engine. Meaning, all of the data that is going to flow from the source to the destination will get into memory. The more memory SSIS can get, the faster it's going to go. However, it's going to have to fight the OS for memory allocations as well as SQL Server itself. Even though SSIS is SQL Server Integration Services, it does not run in the same memory space as the SQL Server database. If your server has 10GB of memory allocated to it and the OS uses 2GB and SQL Server has claimed 8GB, there is little room for SSIS to operate. It cannot ask SQL Server to give up some of its memory so the OS will have to page out while trickles of data move through a constricted data pipeline.
  3. Shoddy destination. Depending on which version of SSIS you are using, the default access mode for an OLE DB Destination was "Table or View." This was a nice setting to try and prevent a low level lock escalating to a table lock. However, this results in row by agonizing row inserts (1.25M unique insert statements being sent). Contrast that with the set-based approach of the Execute SQL Tasks INSERT INTO. More recent versions of SSIS default the access method to the "Fast" version of the destination. This will behave much more like the set-based equivalent and yield better performance.
  4. OLE DB Command Transformation. There is an OLE DB Destination and some folks confuse that with the OLE DB Command Transformation. Those are two very different components with different uses. The former is a destination and consumes all the data. It can go very fast. The latter is always RBAR. It will perform singleton operations for each row that flows through it.
  5. Debugging. There is overhead running a package in BIDS/SSDT. That package execution gets wrapped in DTS Debugging Host. That can cause a "not insignificant" slowdown of package execution. There's not much the debugger can do about an Execute SQL Task-it runs or it doesn't. A data flow, there's a lot of memory it can inspect, monitor, etc which reduces the amount of memory available (see pt 2) as well as just slows it down because of assorted checks it's performing. To get a more accurate comparison, always run packages from the command line (dtexec.exe /file MyPackage.dtsx) or schedule it from SQL Server Agent.

Package design

There is nothing inherently wrong with an SSIS package that is just Execute SQL Tasks. If the problem is easily solved by running queries, then I'd forgo SSIS entirely and write the appropriate stored procedure(s) and schedule it with SQL Agent and be done.

Maybe. What I still like about using SSIS even for "simple" cases like this is it can ensure a consistent deliverable. That may not sound like much, but from a maintenance perspective, it can be nice to know that everything that is mucking with the data is contained in these source controlled SSIS packages. I don't have to remember or train the new person that tasks A-C are "simple" so they are stored procs called from a SQL Agent job. Tasks D-J, or was it K, are even simpler than that so it's just "in line" queries in the Agent jobs to load data and then we have packages for the rest of stuff. Except for the Service Broker thing and some web services, those too update the database. The older I get and the more places I get exposed to, the more I can find value in a consistent, even if overkill, approach to solution delivery.

Performance isn't everything, but the SSIS team did set the ETL benchmarks using SSIS so it definitely has the capability to push some data in a hurry.

As this answer grows long, I'd simply leave it as the advantages of SSIS and the Data Flow over straight TSQL are native, out of the box

  • logging
  • error handling
  • configuration
  • parallelization

It's hard to beat those for my money.

billinkc
  • 59,250
  • 9
  • 102
  • 159
  • thanks for such a lucid explanation and solution. Thank you very much. – Teju MB Apr 24 '13 at 05:35
  • I am done with SSIS but when i schedule it from sql server it fails to execute, please check this link for my post. http://stackoverflow.com/questions/16766416/the-job-failed-the-job-was-invoked-by-useruser-the-last-step-to-run-was-step – Teju MB May 27 '13 at 05:24
-1

If you are Passing SSIS Variables As Parameter in Parameter mapping Tab and assigning values to These Variables by Expression Then Your Execute SQL Task consume a lot of time in Evaluating that Expression. Use Expression Task(Separately) To assign Variables Instead of using Expression in Variable Tab.

ADIL
  • 1