2

I use SQL Server 2012 and SSIS. I have two Tables in the same server and same database. I need to transfer all records of both tables into third table.

I need to add some columns (Like Execution ID and some Package Parameters) to result of UNION ALL and after that I have to transfer the records into third table.

I have two solution for doing that but I don't know which ones is more efficient.

Solution 1 : Use two OLE DB DataSources and use Union All Component in SSIS

enter image description here

Solution 2 : Use Union All in SQL Server side and use just one OLE DB Source in SSIS.

enter image description here

Which one is more efficient?

Ardalan Shahgholi
  • 11,967
  • 21
  • 108
  • 144
  • 1
    I'm betting the latter but have you noticed a time difference? Just makes more sense to me to make fewer calls to your data source. – S3S Apr 07 '17 at 20:35
  • 1
    I should say your second option is more efficient because you are doing the union on the sql database and you are making only one call to the database. Is there a big time difference in execution? – Kevin Apr 07 '17 at 20:36
  • Right now, My tables have less than 10000 rows each ones. But in future they will have more than million rows. Because of that I want to chose the best practice here and in some package I need to UNION more than two tables – Ardalan Shahgholi Apr 07 '17 at 20:39
  • 1
    IMO you should only use the Union all data flow component when you need to combine data from two different sources, and even then I'd probably preferring pulling the data into landing tables and doing the UNION ALL on the target server. – mallan1121 Apr 07 '17 at 20:49
  • If the 3rd table is also on the same server then why use SSIS at all? Can't you just write a simple SQL statement to do all of this? – Anthony Hancock Apr 07 '17 at 20:57
  • @AnthonyHancock : this is a tiny part of my package. there is a lot of transformation in this package and at the end I should create the jobs for my packages and there are other columns that must be added in package. – Ardalan Shahgholi Apr 08 '17 at 13:08
  • I try to use SSIS as little as possible, but if it's that integral to your processs, I would definitely recommend the second option. – Anthony Hancock Apr 08 '17 at 20:50

3 Answers3

3

Always favor database operations when possible. If 2 tables are in the same database, there is absolutely no reason to favor an SSIS operation over the query optimizer. Union All is a no blocking operation, so there will be almost no difference in this case, but if this was a join or a more complex operation then the query optimizer would come into play.

Use the database solution as a rule of thumb.

  • One reason you might do it in SSIS is that you prefer the development and debugging style of SSIS. But for performance, sure, for something like this in the same DB you'd favour the DB query. – Rich May 09 '17 at 14:33
  • Good answer. I just want to point out that union all is a partially blocking transformation, not a non blocking transformation. – Khaled Jul 16 '17 at 23:54
0

You can also consider doing it in simple Execute SQL task.

Insert into T3
select * from T1
union all
select * from T2
Srinivasa Rao
  • 172
  • 2
  • 11
0

Sorry to disagree with Cafe Con Leche but once you are comfortable with SSIS you should always use an SSIS task over TSQL. Except for sort operations SSIS is usually faster and provides very easy to set up error handling (including 'bad' rows being redirected to be processed/fixed later) and logging. Almost anything that can be done in SSIS can be done in TSQL (especially if you are comfortable using sp_cmdshell) but its much harder to do logging and error handling in TSQL that is built into SSIS.

benjamin moskovits
  • 5,261
  • 1
  • 12
  • 22
  • 1
    I agree with everything you're saying here apart from the 'SSIS is usually faster' bit. The choice is more often down to what fits your skillset the best. The advantages of error handling and auditing are definitely there for SSIS, as is a clear visual style. I prefer SSIS for ETL, for sure. But to say SSIS is usually faster isn't true- it really wouldn't be faster for things like joins to other tables in the same database and union alls. It can be faster for many situations- as usual it depends. I really don't get people saying you should always use T-SQL over SSIS, or SSIS over T-SQL. – Rich May 09 '17 at 14:32
  • Doing joins against very large tables and then doing Transform operations against the resulting set, in my experience, is usually faster in SSIS. The reason that the tSQL solution is getting so many votes is because to a carpenter whose only tool is a hammer everything looks like a nail. When users only know SSIS through the gui they are scared to use it to create packages directly. – benjamin moskovits May 09 '17 at 14:40
  • SSIS is very efficient, and does things very well. I agree with you about the reasons why some people prefer the T-SQL: its because its what they know. But I'm not sure that you're right about the in database joins/transforms being slower than SSIS. It will depend on the use case. – Rich May 09 '17 at 14:43
  • When you have a large package and you have many SQL task steps its challenging to optimize it and log the steps. It would be interesting for someone to actually try various tasks in TSQL and SSIS to see the timing. If someone was to do this the results would only be relevant if the tables were very large. – benjamin moskovits May 09 '17 at 14:46
  • The example in the original question is trivial and it really makes no difference using one over the other. What I suggested was a "rule of thumb". If all your tables come from the same database, the query optimizer will be faster in most cases. For joins and other transformations the query optimizer will use the existing indexes and statistics to create an appropriate execution plan. If your use case includes redirecting rows an other specific etl operations, then of course you will need ETL constructs to solve your scenario. – Fernando Sibaja May 09 '17 at 17:28