1

I am not a seasoned BI developer so I need help populating my fact table. Firstly, I have populated all my Dimensions from my production database (I'm not using a staging database or tables) using the appropriate SSIS components.

DimParent, DimStudent, DimManager, and DimFacilitator use the natural key as the primary key. The rest of the dimensions use a surrogate key as the primary. The reason for using the natural keys is because I have the same database model for my production(OLTP) database over multiple different schemas (which act as my different Campus locations). DW Diagram

My measureable data is still in my production database and I can't seem to figure out how to populate my fact table. Production OLTP

I was thinking of using a large query with a join but it might get too complex regarding the way I populated my DimAssessmentType by using the query:

select PK_Assessment, [Description] 
from Auckland_Park.Assessment 
union 
select 3, 'International'
Jnr
  • 1,504
  • 2
  • 21
  • 36
  • I don't understand....do you need help writing an insert statement or something? – rory.ap Oct 19 '15 at 13:08
  • And? What have you tried? Where are you stuck? What help do you need? – Ragul Oct 19 '15 at 13:12
  • I want to populate my Fact table. I have tried a large select statement with joins but as I mentioned, I don't think it will work. Could it? Will it be that simple? – Jnr Oct 19 '15 at 13:18
  • You tried it but you don't think it will work? Doesn't seem like you actually tried it. – Nick.Mc Oct 19 '15 at 13:19
  • 1
    You can't get data into a table without an insert statement or a select into statement... – rory.ap Oct 19 '15 at 13:24

1 Answers1

4

Don't be inconsistent. Use surrogate keys for everything. Then no matter what happens (i.e. a campus comes online that does not follow this rule), you can account for it. Being inconsistent is just making work for yourself. Do the design right now. It's a hell of a job reloading a dimension and fact after you've got three years data in your fact.

Anyway. The way I populate a fact is:

  1. Load the facts into a staging table.
  2. The staging table has additional columns which contain your surrogate keys
  3. Run an update statement on your staging table that fills in the surrogate keys
  4. Pick an appropriate window in your fact. Delete and reload that window

It sounds like you might want to do an "inline lookup" in SSIS instead to find SK's. That's fine but it does make it difficult to troubleshoot. Also the SSIS lookup component doesn't scale well (i.e. works for few rows and is very very slow for many rows). Also it doesn't do SCD's very well.

Your statement " I can't seem to figure out..." is very vague. Follow the four steps above and tell me which one you can't figure out.

One issue might be that you are not preserving source system keys in your dim... so you can't look up the new surrogate keys based on source system keys.

Nick.Mc
  • 18,304
  • 6
  • 61
  • 91
  • I read a blog [here](http://sqlblog.com/blogs/jorg_klein/archive/2008/10/22/ssis-decrease-your-fact-table-loading-time-up-to-40.aspx) that states that your business/natural key should be the primary key linking to your fact tables foreign key. And as you stated "you should use surrogate keys for everything". This means I should use an _Identity_ column (surrogate key) and also a _primary key_ column (business key) in my Dimensions table then link the primary key to the Fact table's foreign key. – Jnr Oct 21 '15 at 08:23
  • 1
    You have both keys in your dimension. You ensure that they are both unique (via constraints). That blog says _Store the fact record in the DWH fact table with the gained surrogate keys of the dimensions_. So it is saying to match on _business keys_ in the staged source transactions and dimension and derive your surrogate key, and and _that_ SK goes in your fact. We are saying the same thing. See my last comment: "you are not preserving source system keys (business keys) in your dimension". You need to store both types of keys in your dimension but _only_ surrogate keys on your fact. – Nick.Mc Oct 21 '15 at 08:48
  • 1
    The idea is that the surrogate keys supports unanticipated changes in the business key, supports slowly changing dimensions. Your DW should not use business keys as anything functional (i.e. joins between tables). Business keys are only so you can reference back to source systems. Your DW joins are always on surrogate keys – Nick.Mc Oct 21 '15 at 08:50
  • 1
    The business key in your dimension is not necessarily the _primary key_. You must have unique constraints on both but often in a dimension the surrogate key is the primary key (not always). The business key _must_ have a unique constraint on it at least but it doesn't need to be the primary key. – Nick.Mc Oct 21 '15 at 08:54
  • 1
    First step to load your fact: Write a select statement that contains your facts (additive numbers) as well as your _business keys_. If you want to follow that blogs advice, you feed that result into a bunch of SSIS lookups and use the business key to lookup the surrogate key. Then you write the surrogate key to the fact. – Nick.Mc Oct 21 '15 at 09:11