Fastest way to insert in parallel to a single table

Question

My company is cursed by a symbiotic partnership turned parasitic. To get our data from the parasite, we have to use a painfully slow odbc connection. I did notice recently though that I can get more throughput by running queries in parallel (even on the same table).

There is a particularly large table that I want to extract data from and move it into our local table. Running queries in parallel I can get data faster, but I also imagine that this could cause issues with trying to write data from multiple queries into the same table at once.

What advice can you give me on how to best handle this situation so that I can take advantage of the increased speed of using queries in parallel?

EDIT: I've gotten some great feedback here, but I think I wasn't completely clear on the fact that I'm pulling data via a linked server (which uses the odbc drivers). In other words that means I can run normal INSERT statements and I believe that would provide better performance than either SqlBulkCopy or BULK INSERT (actually, I don't believe BULK INSERT would even be an option).

As you suggested, parallel writes will probably not help performance. You might find some answers here http://stackoverflow.com/questions/2861944/how-do-i-do-very-fast-inserts-to-sql-server-2008 — John Watts, Jun 19 '12 at 03:42
Are you migrating the data? I ask because you said your moving data to local tables. Also how are you doing this in code eg c# or as part of a sql job? — Chris Moutray, Jun 19 '12 at 04:45
@mouters Yes, but new data doesn't come into our database so I will have to copy anything new over on a daily basis. So far I've been doing it in ssms by hand but the parallel idea will be done in c# if I do that. Probably not going to do this one in parallel b/c I got all the historical data now and really just need to setup a sql job to run and get new data nightly. But I just wanted to get some info on the idea b/c I may have use for it in the near future on other stuff. — Brandon Moore, Jun 21 '12 at 07:46
Not really sure if this is applicable but did you consider storing your parallel queries' results into separate local tables to "concatenate" them afterwards? — Andriy M, Jun 21 '12 at 14:01
@AndriyM Actually I did think about that... and then I wondered if that would be helpful if everything is being done on the same hard drive, or if I'd need tables to be on separate hd's for that to be a significant help. I also thought about storing into in-memory table variables which I think would be faster but also require more development effort from me. — Brandon Moore, Jun 22 '12 at 15:01

score 13 · Accepted Answer · edited Feb 17 '19 at 17:21

Have you read Load 1TB in less than 1 hour?

Run as many load processes as you have available CPUs. If you have 32 CPUs, run 32 parallel loads. If you have 8 CPUs, run 8 parallel loads.

If you have control over the creation of your input files, make them of a size that is evenly divisible by the number of load threads you want to run in parallel. Also make sure all records belong to one partition if you want to use the switch partition strategy.

Use BULK insert instead of BCP if you are running the process on the SQL Server machine.

Use table partitioning to gain another 8-10%, but only if your input files are GUARANTEED to match your partitioning function, meaning that all records in one file must be in the same partition.

Use TABLOCK to avoid row at a time locking.

Use ROWS PER BATCH = 2500, or something near this if you are importing multiple streams into one table.

For SQL Server 2008, there are certain circumstances where you can utilize minimal logging for a standard INSERT SELECT:

SQL Server 2008 enhances the methods that it can handle with minimal logging. It supports minimally logged regular INSERT SELECT statements. In addition, turning on trace flag 610 lets SQL Server 2008 support minimal logging against a nonempty B-tree for new key ranges that cause allocations of new pages.

Thanks, this is good but really doesn't apply (other than #1) to my situation because I'm not loading data from a file but from a linked server via odbc drivers... so I do regular set based insert statements anyway. So I wonder how I would go about doing the equivalent of setting ROWS PER BATCH on a normal insert statement or if I can? — Brandon Moore, Jun 21 '12 at 07:55
Are you using SQL Server 2008 or above? Minimal logging on standard INSERT SELECT statements is supported under certain circumstances. — 8kb, Jun 21 '12 at 19:41

score 5 · Answer 2 · answered Jun 21 '12 at 08:30

5

If your looking to do this in code ie c# there is the option to use SqlBulkCopy (in the System.Data.SqlClient namespace) and as this article suggests its possible to do this in parallel.

http://www.adathedev.co.uk/2011/01/sqlbulkcopy-to-sql-server-in-parallel.html

answered Jun 21 '12 at 08:30

Chris Moutray

18,029
7
45
66

Only thing bad about doing anything bulk is that you'll have to rebuild you indexes afterwards since it ignores them on the insert. – SQLMason Jun 21 '12 at 19:55
Well, I was just thinking of using c# to manage the parallelization. I don't think using SqlBulkCopy would be any faster than using SqlCommand.ExecuteNonQuery() to issue the same set based insert statement I would do directly from SSMS, do you? I think one thing people answering keep forgetting is that this isn't coming from a flat file... I have access to run regular insert statements directly on the source data. – Brandon Moore Jun 21 '12 at 23:47
I presume when you say this "the same set based insert statement" you mean you are able to do a cross database join? Also are you trying to insert into the source as well, or do you just mean you have read/write access to the source database (but is basically irrelevant with what your trying to do here)? – Chris Moutray Jun 22 '12 at 05:42

score 2 · Answer 3 · answered Sep 05 '14 at 12:19

2

If by any chance you've upgraded to SQL 2014, you can insert in parallel (compatibility level must be 110). See this: http://msdn.microsoft.com/en-us/library/bb510411%28v=sql.120%29.aspx

answered Sep 05 '14 at 12:19

Fat Shogun

987
1
10
18

Fastest way to insert in parallel to a single table

3 Answers3

Linked