0

I'm using SSIS to create a star schema for a data warehouse with surrogate keys (sg).

My process goes like this:

  1. find max sg (using SQL)

  2. in data flow: data source-> c# script that adds +1 to the max sg -> write to destination.

Now, with fixed dimensions it works without problems. Every added row gets the sequential sg.

However when I use the Slowly Changing Dimension and historically updating a row I get the following:

sg_key  |  name | city | current_row
1       |  a    |  X   | true
2       |  b    |  Y   | true
3       |  c    |  Z   | false
4       |  d    |  H   | true
7       |  c    |  T   | true

Now, correct me if I'm wrong but I always thought SSIS is pushing one row at a time through all the flow tasks, but it looks like it first generates ALL the sg_keys for all the rows and then sends the updated row through the flow.

Do I understand how SSISworks in a wrong way? How can I fix it?

Cheers, Mark.

mRiddle
  • 214
  • 1
  • 7
  • 22
  • Because you are generating the key, and then passing it to the [SCD](https://msdn.microsoft.com/en-us/library/ms141715.aspx), this is most likely a consequence of your C# logic. Is that code generating a unique key for each row, even if the row is not new? Managing SKeys can be a real pain. I would second the advice from @ArnaudGastelblum. Let [SQL Server handle this for you](https://msdn.microsoft.com/en-us/library/ms186775.aspx). – David Rushton Jan 04 '17 at 10:32
  • The code generates a new sg for every row. However, my understanding was that since the non-changed rows aren't being put in the destination table again, the max sg key will remain the same until the updated row will come. – mRiddle Jan 04 '17 at 11:57
  • Can you show us the code? I'm assuming your pipeline follows this basic pattern: Data Flow Task (Source >> Script Task >> Destination). Is that right? – David Rushton Jan 04 '17 at 12:05
  • Yes, that is my flow task. Not sure how to put a peice of code here: [Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute] public class ScriptMain : UserComponent { private int NextKey = 0; public override void Input0_ProcessInputRow(Input0Buffer Row) { int MaxKey = Variables.MaxSgKey; if (NextKey == 0) { NextKey = MaxKey + 1; } else { NextKey = NextKey + 1; } Row.SgKey = NextKey; } } – mRiddle Jan 04 '17 at 12:14
  • You are right. Within a data flow; rows are processed one at a time. In this case each row from the source is passing into the script, which increments the surrogate count. The row then passes into the SCD. If the SCD does not create a new row then the new surrogate value is not used. If you want to avoid unused surrogate values you will need to update the C# logic, to increment only for new rows. Of course, you could argue that it does not matter if the surrogate keys jump from 4 to 7. Their value is meaningless. All that matters is: they are unique and consistent. – David Rushton Jan 04 '17 at 12:28
  • Oh, ok, got it. I thought that if the sg value wasn't used the data flow will start from the calculation of the previous step, but it seems like the max values is being drawn only once (by an sql step) and than does into the data flow. So I need either to distinguish the case when the sg is not recorded or look at other tools to increment an sg. – mRiddle Jan 04 '17 at 19:22

1 Answers1

4

If you use SQL Server as a destination, why didn't use an IDENTITY Column? (instead of a C# script)

https://msdn.microsoft.com/en-us/library/ms186775.aspx

Identity will automatically increment your column when you insert a new row. If you don't update this column, the value will not change.

Arnaud