2

I'm designing on my first real Kimball-style star schema dimensional model. I've slogged through a lot of his books, and I'm part-way through Star Schema - The Complete Reference by Adamson (which I find much more practical and straightforward than the Kimball books). I've not been able to find an answer to this question. Please help! A reference to an article or book would be fine.

In dimensional modeling, what is the canonical way to model the relationship between a fact and its corresponding baseline(s), projection(s), or goal(s)?

For instance, suppose that, for company A, in year 2016, its actual sales were $1 million. That's clearly a fact.

Additionally suppose that, in 2014, the company projected sales for 2016 to be $1.2 million, and in 2015, the company projected sales for 2016 to be $1.1 million.

But it turns out that the 2014 (older) projections are the one against which the 2016 sales must be measured. In other words, we need to explicitly specify the relationship between actual sales and their related projected sales estimate. So "drill-across" won't work, because we aren't sure which projections are the right ones to be comparing to.

Essentially, it seems like a fact needs to be explicitly related to another fact, which (according to the literature) is verboten?

So which of these implementations is the canonically best?

  1. Create fact_sales and fact_sales_projection (at the same grain). Include sales_projection_key in fact_sales, essentially linking a fact to a fact (which is supposedly not a good idea).

  2. Create fact_sales and dim_sales_projection (at the same grain), so call the projections dimensions, even though they contain the same numbers as the facts they support. Again include sales_project_key in fact_sales, but it's OK now, because semantically, it's a dimension.

  3. Just create fact_sales with a dimension called SalesType which is either "Actual" or "Projected". Include both the actual and projected sales in the same table, with a self-joining key from "actual" sales records to their correct "projected" sales record.

  4. Just create fact_sales, but add additional fact columns which contain the projections as well as actual sales. This will result in considerable duplication of projection data but ensures that actual data are kept side-by-side with the projection that "matters".

Of these, I'm pretty sure that #3 is not the best solution. I've waffled a lot as to whether #1, #2, or #4 is best, although (to me) #1 seems like a better idea than #2.

Any input is appreciated. Also, I'm not entirely clear if this type of questions is better for the "Stack Overflow" forum or the "Database Administrators" forum?

Jim Burnell
  • 948
  • 3
  • 9
  • 21
  • I'd go with #1 and create a link via the date dimension and maybe a department or other related items. Do you know about [Galaxy Schemas](https://en.wikipedia.org/wiki/Fact_constellation)? – tobi6 Jun 17 '16 at 07:45
  • I Googled galaxy/constellation schemas, but I didn't find much substantive. It seems to me like it's really just a star schema with multiple stars, but the star schema books I've read advocate multiple stars already. Am I missing something? – Jim Burnell Jun 17 '16 at 14:20
  • You are right, it is a constellation of stars (schemas). This should cover your requirement, though. Since there are two different fact constellations which have to be connected. – tobi6 Jun 17 '16 at 14:36

1 Answers1

1

I'd suggest you model it as it should be, with sales and projections in their own facts. You could handle the odd mapping of year's projection to use in the query.

Kimball would have you model real things.

If it turns out 2014 isnt a year but is instead a name of a forecast set, it should be modelled as such so you select the name of the forecast set in the query.

Rich
  • 2,207
  • 1
  • 23
  • 27
  • Thanks for answering... so is it OK to have a key that maps a fact to a different fact? – Jim Burnell Mar 21 '17 at 13:42
  • You can, but not sure you'd need to here. The way you should typically link between sales and target facts is to filter both by the dimensions in common. So you have your query on your sales fact (grouped by stores, products, time) and you have your query on your targets fact (grouped by stores, products, time). You simply join the two queries to get both sales and targets for stores, products, time. – Rich Mar 21 '17 at 13:45