Bulk inserts of heavily indexed child items (Sql Server 2008)

Question

I'm trying to create a data import mechanism for a database that requires high availability to readers while serving irregular bulk loads of new data as they are scheduled.

The new data involves just three tables with new datasets being added along with many new dataset items being referenced by them and a few dataset item metadata rows referencing those. Datasets may have tens of thousands of dataset items.

The dataset items are heavily indexed on several combinations of columns with the vast majority (but not all) reads including the dataset id in the where clause. Because of the indexes, data inserts are now too slow to keep up with inflows but because readers of those indexes take priority I can not remove the indexes on the main table but need to work on a copy.

I therefore need some kind of working table that I copy into, insert into and reindex before quickly switching it to become part of the queried table/view. The question is how do I quickly perform that switch?

I have looked into partitioning the dataset items table by a range of dataset id, which is a foreign key, but because this isn't part of the primary key SQL Server doesn't seem make that easy. I am not able to switch the old data partition with a readily indexed updated version.

Different articles suggest use of partitioning, snapshot isolation and partitioned views but none directly answer this situation, being either about bulk loading and archiving of old data (partitioned by date) or simple transaction isolation without considering indexing.

Is there any examples that directly tackle this seemingly common problem?

What different strategies do people have for really minimizing the amount of time that indexes are disabled for when bulk loading new data into large indexed tables?

Note that it doesn't matter if readers get 'stale' data for a while so transactions are not too important and that I can isolate the writer to a single thread, although I would like to scale the insert out and be able to update multiple datasets in parallel, possibly by having one writer thread responsible for one partition. — Chris Woodward, Aug 13 '12 at 11:23
Some related links: http://social.msdn.microsoft.com/Forums/en-US/sqldatabaseengine/thread/e872ba33-cb3e-49f6-ada9-5152735aa0cb http://social.msdn.microsoft.com/Forums/en/sqldatabaseengine/thread/783c84a3-307a-46a6-8250-42816ede62f5 — Chris Woodward, Oct 18 '12 at 12:31
Related questions I've since found from others who look like they're trying to achieve the same thing. The thing is I want to be able to add to and reindex several partitions in parallel. Since you can't disable indexes on a partition level to do an insert while reindexing another I'll have to have a separate working table for each partition, which means I have to do it as dynamic sql. :( http://stackoverflow.com/questions/1367972/drop-index-at-partition-level http://stackoverflow.com/questions/2772738/sql-server-2008-disable-index-on-one-particular-table-partition — Chris Woodward, Oct 19 '12 at 15:00

score 1 · Accepted Answer · answered Aug 13 '12 at 11:56

1

Notice, that partitioning on a column requires the column to be part of the clustered index key, not part of the primary key. The two are independent.

Still, partitioning imposes lots of constraints on what you operations you can perform on your table. For example, switching only works if all indexes are aligned and no foreign keys reference the table being modified.

If you can make use of partitioning under all of those restrictions this is probably the best approach. Partitioned views give you a more flexibility but have similar restrictions: All indexes are obviously aligned and incoming FKs are impossible.

Partitioning data is not easy. It is not a click-through-wizard-and-be-done solution. The set of tradeoffs is very complex.

answered Aug 13 '12 at 11:56

usr

168,620
35
240
369

Ok, I understand about the difference between clustered index and primary key. Can you explain the extra flexibility of partitioned views? Under what circumstances would partitioned views be more suitable than partitioned tables. – Chris Woodward Aug 13 '12 at 13:59
When you need different schema for the different tables or when you don't want the partitioning key as part of the CI. You can also have *different* indexes. You could have a partitioned view with two base tables: Archive and Stage. During the day you load into Stage and at night you merge all of it into Archive and truncate Stage. – usr Aug 13 '12 at 14:09

Bulk inserts of heavily indexed child items (Sql Server 2008)

1 Answers1

Linked