0

I have the following entities in my data mart:

  • Location: Contains information like Country, City, etc.
  • Users: Contains information about a user this includes their address, which includes Country and City

Both these dimensions contain attributes regarding country and city. Storing the attributes only in the location dimension would make more sense, so we don't duplicate data and we can retrieve the user's location information by joining the tables surrogate keys together through the fact table. However, doing it this way would increase the overhead because we would need to perform multiple joins.

I’m inexperienced with dimensional modelling and data warehousing, so I don’t know which one is better practice:

  1. Let both of the dimensions contain the same attributes
  2. Put the attributes in the table that it belongs to (in this case the Location dimension)
Amous
  • 534
  • 5
  • 18

1 Answers1

0

Option 1 is nearer to the Kimball way. Dimensional modelling doesn't prioritise avoiding duplication of data. In this case, they'd argue that the data 'belongs to' multiple tables. Users have information about their address, so that that's information about a user- it belongs to the user.

The important thing is that the data is conformed in the naming of the fields and the nature of the content- its up to you as the person in charge of ETL/the DW to do the work to make sure this is the case.

A lot of people go with Option 2. For this way, I don't know the nature of your fact (so I might be wrong) but it sounds like if you wanted to know the Location of a User you would have a key from the dimension rather than the fact- which would be a 'snowflake' structure. These aren't always wrong, but don't allow you to get the most out of a dimensional model.

Rich
  • 2,207
  • 1
  • 23
  • 27
  • Let's say I'm trying to measure the products bought buy a user (customer) at a certain location on a certain day. My fact table profiles these transactions. One of the questions I want to answer is the different areas (locations) that users buy products. – Amous May 19 '17 at 12:06
  • If it's the location of the user and not the store they bought it in, then the location would be in the user dimension table. – Rich May 19 '17 at 12:42
  • I'm more interested in the location of the product (eg, where the product was purchased). In this case would it make sense to put the location in the product dimension? If it does go in the product dimension, then would I really need a location dimension? – Amous May 19 '17 at 14:45
  • It depends where your location data for the purchase comes from. if its a 'location of the customer' that you know about the customer, then put it on the customer dimension. if its the location of a retail outlet, put it as a location on the retail outlet dimension. If its a location that is simply stamped on the source of the fact data for each purchase transaction (independent of the customer) then put it in its own dimension that the fact links to. – Rich May 19 '17 at 14:50
  • Ah, the location data is the location of the customer. I shall put it in the customer dimension. By doing this I wouldn't need a location dimension? – Amous May 19 '17 at 14:58