0

I'm new in DW and I need to create a very simple warehouse for an e-commerce website.

Dimension tables

  • date dimension table (id, year, quarter, month, day)
  • time dimension table (id, hour, minute)
  • product dimension table (id, product name, price, category id)
  • product category dimension table (id, category name)

Fact table

  • sales per product (date id, product id, sales count, price sum)

This fact table is suitable for questions like "How much products were sold in June" etc.

But I will need answer questions like "In which category were sold the most products in June?" or "What is the most business successful hour of the day on wednesdays?".

I see two possibilities:

  1. I can add new column (category id or time id) into product fact table. But this id will change table's granularity
  2. I can create another fact table for categories which will contains facts about categories. But this (in my opinion) is wasting disk space, isn't it?

Which possibility is correct?

jnemecz
  • 3,171
  • 8
  • 41
  • 77
  • 1
    May downvoter explain what is wrong with this question? I can to add some missing information if needed. – jnemecz Mar 02 '13 at 06:22

3 Answers3

1

Your sales fact table should be by order line item

You could add a degenerate dimension just for hour of the day, or add a time dimension table at the grain of hour.

Why don't you just add category name to dim_product?

Neil McGuigan
  • 46,580
  • 12
  • 123
  • 152
  • Because product may be listed in more categories. – jnemecz Mar 02 '13 at 05:58
  • 1
    Its fine, it's just a product hierarchy. You can have the same category in many records of the product dim. Data duplication is OK in a star schema. – N West Mar 04 '13 at 01:47
0

If you create your Fact_table at the sales per item level, i.e. at almost the transactional grain, I think you can then add category id to the Fact_table referencing to one Dim_category. This definitely adds to the disk storage with respect to the Fact_table, but at later stage this Datamart will be able to answer any question that u have.

Divas
  • 455
  • 2
  • 7
  • 14
0

Yup. Always create the fact at the most granular that occurs in you business process (if your source system allows you to)- in this case each sale of each product to a customer at a specific data and time. You can always create aggregates from those granular facts later if (and only if) needed for performance

And disk space is pretty cheap compared to the effort of rework.