0

Considering there is a data warehouse contains one fact table and three dimension tables.

Fact table:

  • fact_orders

Dimension tables:

  • dim_user
  • dim_product
  • dim_date

All the data of these tables are extracted from our business systems.

In the business system, the user has many attributes, some of which could change upon time(mobile, avatar_url, nick_name, status), some others won't change once the record is created(id,gender,register_channel).

So generally in the dim_user table, which fields should we use and why?

mingchau
  • 450
  • 3
  • 12

1 Answers1

1

Dim_User should have both changeable and unchangeable fields. In denormalized model, it is preferrable to keep all the related attributes of a dimension in a single table.

Also, it is preferrable to keep all the information available about user in the dimension table, as they might be used for reporting purposes. If they won't be needed for reporting purpose, you can skip them.

If you want to keep the history of change of the user, you can consider implementing slowly changing dimensions. Otherwise, you can update the dimension attributes, as and when they change. It is called SCD Type I.

Venkataraman R
  • 12,181
  • 2
  • 31
  • 58
  • The data warehouse will be mainly used for OLAP cube analysis. I read some articles, they claim that the dimensions should only contain the attributes that could be used to slice the cube. In my case, the changeable attribute, say the field `nickname`, I don't think it could be used to slice the cube. Should we still put it in the dimension table? – mingchau Apr 12 '19 at 08:24
  • The only usage for the `nickname` field as I could think is when we slice the cube and want to get the users' detailed information in that cuboid. Is this a intended usage of a data warehouse? – mingchau Apr 12 '19 at 08:29
  • @mingchau, No. It is not true. You can additional have filters, which used to limit the data. See [difference between slicers and Filters](https://www.modbi.com/blog/power-bi-slicer-vs-filter). If the attribute can be used as filter or slicer, then add to the dimension. Otherwise, dont add. – Venkataraman R Apr 12 '19 at 08:33
  • Venka, thanks for replying. Would you mind message you by email? Since I'm quite new to the DW subject, and I have many basic questions upon this. – mingchau Apr 15 '19 at 02:57
  • @mingchau, you can raise different questions here or as separate question. It will be helpful for others as well, as it is public and helpful in future – Venkataraman R Apr 15 '19 at 03:54
  • Ok. In our business system, we have an `order` table and an `order_item` table, with 1:N relationship (1 order could contain N different items). When designing the DW, I consider use the `order_item` table as the fact table to relate to other dims(product, user). But there is a `status` field in the order table(which indicates the order was shipped, paid, or payment pending, etc), how should i put the `status` in the fact table? – mingchau Apr 15 '19 at 06:20
  • Also, do we need to design multiple data warehouses? If the answer is no, should we load all our data into the only data warehouse? I read opinions say that the data warehouse should be the only portal for data applications. In our business, there are some tables that neither are facts nor dimensions. – mingchau Apr 15 '19 at 06:26
  • @mingchau, this kind of DWH design needs lots of consideration. For Order header, Order detail, refer to this https://www.slideshare.net/nongoffna/data-warehouse-order-management. Also, I would suggest you to raise separate questions for each design. There are many experts and they will give best choices. It has been long time, since I worked on a DWH. – Venkataraman R Apr 15 '19 at 06:41