How to divide table for parallel loading

Question

Which division option is better (for performance purpose) based on clustered index column or on partition (the same column)?

I have to split table to load it in parallel using SSIS (SQL Server 2008R2 Enterprise Edition) to Oracle 11.

First option creates Clustered Index Seek based on where conditions and second creates Clustered Index Scan on partition.

RichardCL · Accepted Answer · 2016-05-01T10:22:58.720

If the partitions in the source database are on different disks, then you could take advantage of this to do parallel reading. In your SSIS package, you would have multiple Data Sources, each reading from a different partition. Then you could merge the streams together (Union All transform) before loading.

However your target system (Oracle?) might become a bottleneck in this case, or indeed the computer running SSIS, or the network. Generally it is best to run SSIS on a separate computer with sufficient processor power and RAM, so it doesn't take resources from SQL Server.

If you are running SQL Server Enterprise Edition, then the Query Optimizer may well decide to make a parallel execution plan, because of the large volume of data. You should examine the graphical execution plan to see what it is doing.

You need to test to determine where the bottlenecks are. Then optimize to remove each bottleneck. Once you clear one bottleneck, another part of the overall system will become the limiting factor.

You could do some simple timing tests like this. First single stream.

Then parallel stream. That would allow you to measure how your SQL Server computer is performing and whether there is any benefit in using parallel streams.

There's a course on Microsoft Virtual Academy: "Designing BI Solutions with Microsoft SQL Server" and the fifth module "Design an ETL Solution" talks about different strategies for loading using SSIS.

Also an important point: you need to run the SSIS package on a server for maximum performance. If you run the package within Visual Studio (SQL Server Data Tools), the performance will be very much reduced.

What's about if I just create many (2 or 3) Data Flows blocks with different where condition (just dividing table based on one column) and run them in parallel? This will be faster than one Data Flow block (whole table)? — PNPTestovir, Apr 29 '16 at 12:41
I've updated my answer to show how you could test this. It's only by testing that you will find out the answer. — RichardCL, Apr 29 '16 at 12:53

How to divide table for parallel loading

1 Answers1