0

Here is the scenario

I have one staging table for csv file which is My source I am loading it into physical staging table I will be doing transformations on this staging table data in later part of package I need fresh data (as it is from source)

Should I do transformation in temp table or should I use dataflow task again to reload staging table

The data isnt more [Smile] just less than a million only

Ashish
  • 3
  • 3

1 Answers1

0

There is a standard pattern for this.

  1. Extract the data (from the CSV to your temp area)
  2. Transform the data (clean it, convert it, format it, join other stuff to it, make it compatible with your new system)
  3. Load the data (update/insert/delete to your live tables)

This is where the acronym for ETL comes from - http://en.wikipedia.org/wiki/Extract,_transform,_load

The primary advantages you have are that at point 1 you have only 1 thread/user loading the data so it can be extracted quickly, then at stage 2 you are manipulating the data without causing any locks on other tables. Finally, once the data is ready, you are able to load it in the quickest method possible to your live tables.

Your two biggest (often competing) concerns are Simplicity and Speed. Simplicity is great because it involves less code, makes for less debugging required and makes you far more confident that your data is clean. Sometimes you have to sacrifice simplicity for speed however.

In your case, since you are only loading a few million rows, I'd suggest you just reload the staging table every time so every single load uses the same ETL process. This keeps your ETL mechanism easy to code, maintain and explain.

FYI - if you're using SQL Server, check out SSIS.

Dave Hilditch
  • 5,299
  • 4
  • 27
  • 35
  • thnks I am using temp tables in the package as there are multiple other things (a complete business process and not just ETL) in My package. So I thought better to avoid consuming server space (RAM) due to temp table if I can find a better option. so the question. Also as you said maintainability does gets better – Ashish Sep 04 '12 at 14:48