0

I'm loading datafiles to a staging table which works perfectly fine, but after that I need to convert to several datatypes for the final table and the insert statement becomes not feasible for big tables. The statements are created by a tool of mine and I want to optimize this part. The insert statements look something like that, but with lot more columns, validity checks and some needed replacements inside these converts.

INSERT INTO foo
SELECT
convert(decimal(10,2),col1),
convert(date, col2),
convert(decimal(10,2),col3),
convert(int, col4),
convert(decimal(10,2),col5)
...
from foo_staging

On small tables this works pretty fast, but as the table becomes bigger the performance becomes really really bad. Any ideas how to improve the performance of the convert?

EDIT: I'm using Bulkinsert into the staging table and the files are really big.

McNade
  • 115
  • 1
  • 7
  • What is the need for having two same tables with different datatypes – TheGameiswar Feb 04 '16 at 14:45
  • 1
    The data I get from the source is really messed up in a lot of ways (i.e. on numerics there is "-" on first position or last, textqualifier, escaped characters which need to be replaced by original characters...), so I thought that putting the messed up data into a staging table and after "cleaning" into a final table would be more efficient than i/o operations on source data. – McNade Feb 04 '16 at 14:54
  • Can you have a trigger on staging table which converts the data as above and insert into final table everytime a record is inserted – TheGameiswar Feb 04 '16 at 14:59
  • I'm not sure if there are benefits by using a trigger, I should have mentioned that I bulkinsert big data chunks (40m rows) into the staging table at once. I'm sorry, going to edit that. – McNade Feb 04 '16 at 15:06

1 Answers1

0

A few quick thoughts, as there's no one Correct answer here:

  1. Load data into staging
  2. Clean/Convert by copying to a second staging table defined using the desired datatypes. Good data copied over, bad data left behind
  3. Copy data from the "clean" table to the "live" table

For copying data from table to table, perhaps the fastest option is to use table partitioning. This can get complex, you'd need to read up on the concept and figure if it applies to your data or not. (Also, you'd need SQL Server Enterprise edition.)

An alternative is to "chunkify" the data being copied, by breaking it into (say) sets of 10,000 rows, or some similarly arbitrary amount, and looping until you are done. This can be done using the TOP N syntax, figuring out date or key ranges based on the data being loaded, or perhaps even tricks with row_number().

(Note again that availability of these features depends on the version/edition of SQL that you are using.)

Philip Kelley
  • 39,426
  • 11
  • 57
  • 92
  • Thank you very much! I could improve the performance, because I "chunkified" the data using an altered version of [this solution](http://stackoverflow.com/questions/864341/move-sql-server-data-in-limited-1000-row-chunks/865826#865826). – McNade Mar 24 '16 at 20:11