4

I am a newcomer to data warehouses and have what I hope is an easy question about building a star schema:

If I have a fact table where a fact record naturally has a one-to-many relationship with a single dimension, how can a star schema be modeled to support this? For example:

  • Fact Table: Point of Sale entry (the measurement is DollarAmount)
  • Dimension Table: Promotions (these are sales promotions in effect when a sale was made)

The situation is that I want a single Point Of Sale entry to be associated with multiple different Promotions. These Promotions cannot be their own dimensions as there are many many many promotions.

How do I do this?

Jon Seigel
  • 12,251
  • 8
  • 58
  • 92
Mike Gates
  • 1,874
  • 3
  • 21
  • 40

3 Answers3

8

For cases when you truly have a "multi-valued" dimension, a Bridge Table is usually the solution that Kimball recommends.

Your "Promotion" dimension simply is a record of each promotion, with its attributes (start date, end date, coupon code, POS promo code, Ad Name, etc). The relationship from promo to product isn't modeled here, since it will be reflected in the fact table.

Promotion/Discount Dimension would look like (1 row per unique planned promotion)

Promotion Dim ID
Promo Code
Coupon Code
Promo Start DTTM
Promo End DTTM
... etc ...

Your Sales Fact would look like:

Tran Date
Tran Line #
Customer Dim ID
Product Dim ID
Promotion Group Dim ID
Net Sale Price
Average Cost
Discount Amount

Your "Promotion Group" bridge table would then be the set of combinations:

Promotion Group Dim ID
Promotion Dim ID

If a sale occurs that has 3 promotions on it, you simply create group ID that relates to each promo, then put the group ID on the fact table. It's very similar to the way that medical reporting systems deal with multiple diagnoses.

Note that by using a Bridge table, you can easily double count sales, so I advise that reports using this method be developed by folks that understand the model.

N West
  • 6,768
  • 25
  • 40
  • 3
    This is an answer 2.5 years late, but I've been dealing with this myself, so I figured I'd add my answer. – N West Dec 03 '12 at 14:20
  • 1
    Thanks for taking the time to do it! – d_a_c321 Feb 20 '13 at 04:17
  • Note that I have evolved in my thinking and would not necessarily recommend a Kimball-style approach to anything. However, as you will hear that from the majority of data warehouse people, I'll leave the answer as is. – N West Feb 03 '16 at 14:26
  • @NWest I have an `orders` fact table and a `salesman` dimension. Sometimes (not very often) one order will be related to more than 1 salesman, how would you suggest this model? – mingchau Apr 18 '19 at 03:49
1

Time is almost always a dimension in a star schema.

"In effect" suggests that there is a start and end date for a Promotion.

So a Promotion might itself be a fact that has a start and end date reference to the Time dimension.

Maybe with a model like this you could have a JOIN table to relate Sale to Promotion in a many-to-many fashion between facts.

"Many, many" Promotions - yes, but how large is that? One per day means 365 records per year. I'll assume that Promotions are associated somehow with Products or Categories. A Sale would have a timestamp and multiple Products.

You have to store them somewhere, sometime or your model falls apart. Why the reluctance to model Promotion that way?

My advice would be to not worry about the size of the data and concentrate on modeling the problem as best you can. Get the logical model right first, then worry about the physical model and the data sizes.

duffymo
  • 305,152
  • 44
  • 369
  • 561
  • This i a fictitious example, but the reason I couldn't use time as a join would be that a particular product is under a promotion during a time period when not all products are under the same promotion. Think of coupons, these are promotions which can apply to specific products, but they can also apply to more than one product at a time. I know I have to store promotions somewhere, but having a dimension for every promotion would not work. I know enough at this point to know that I don't want a fact table with 3000 columns, most of them pointing to a "Not under this promotion" record. – Mike Gates May 07 '10 at 00:20
  • Everything I read about star schema says that time is the FIRST dimension you include. And I don't think it would be 3000 columns point to promotion, which would obviously break first normal form for any relational model. It would be a single key to the promotion at hand for that sale. How many apply to a single sale? – duffymo May 07 '10 at 00:24
  • I hoped it wouldn't get to this. This was a fictitious example so of course logical holes would develop. How about...each POS entry can be a member of 1 or more named groups. There can be any number of these groups and are assigned by a grouping table out in the normal relational world. I want to be able to query the cube (built from the star schema) to find out what POS entries fall into groups: "Group 1", "Group 2" and "Group 3". And again, there can be any number of groups out there as they are custom created by managers or something like that. – Mike Gates May 07 '10 at 00:48
  • Or, to continue the previous example, say that 0, 1, or more promotions can apply to a single sale. – Mike Gates May 07 '10 at 00:51
0

You should load a fact record for each promotion, even if the dollar amount is the same. If in fact, each type of promotion in your example is truly represented by this specific dollar amount, then a fact record should be loaded with the key of the promotion type, also containing keys back to other related dimensions (including Date).

The main point here is don't worry about data duplication. Think about a sales-oriented Data Warehouse, for say, a fast food company. One can assume there won't be just one fact record for $4.13, which is used to represent a million distinct sales of "value meal #3". Instead, each record in the "Transaction" dimension would have a relationship with at least one specific fact record in this hypothetical Sales fact table.

jamz
  • 4,991
  • 4
  • 24
  • 19
  • In my use case, I have an `orders` fact table and a `salesman` dimension. Sometimes (not very often) one order will be related to more than 1 salesman, how would you suggest this model? It's like in your opinion, the fact table will have two same row except the FK salesman_id? But this will have problem when do sum aggregation. – mingchau Apr 18 '19 at 03:54