1

I'm struggling to understand the best way to model a particular scenario for a data warehouse.

I have a Person dimension, and a Tenancy dimension. A person could be on 0, 1 or (rarely) multiple tenancies at any one time, and will often have a succession of tenancies over time. A tenancy could have one or more people associated with it. The people associated with a tenancy can change over time, and tenancies generally last for many years.

One option is to add tenancy reference, start and end dates to the Person Dimension as type 2 SCD columns. This would work well as long as I ignore the possibility of multiple concurrent tenancies for a person. However, I have other areas of the data warehouse where I am facing a similar design issue and ignoring multiple relationships is not a possibility.

Another option is to model the relationship as an accumulating snapshot fact table. I'm not sure how well this would work in practice though as I could only link it to one version of a Person and Tenancy (both of which will have type 2 SCD columns) and that would seem to make it impossible to produce current or historical reports that link people and tenancies together.

Are there any recommended ways of modelling this type of relationship?

Edit based on the patient answer and comments given by SQL.Injection

I've produced a basic model showing the model as described by SQL.Injection. Suggested model

I've moved tenancy start/end dates to the 'junk' dimension (Dim.Tenancy) and added Person tenancy start/end dates to the fact table as I felt that was a more accurate way to describe the relationship.

However, now that I see it visually I don't think that this is fundamentally any different from the model that I started with, other than the fact table is a periodic snapshot rather than an accumulating snapshot. It certainly seems to suffer from the same flaw that whenever I update a type 2 slowly changing attribute in any of the dimensions it is not reflected in the fact.

In order to make this work to reflect current changes and also allow historical reporting it seems that I will have to add a row to the fact table every time a SCD2 change occurs on any of the dimensions. Then, in order to prevent over-counting by joining to multiple versions of the same entity I will also need to add new versions of the other related dimensions so that I have new keys to join on.

I need to think about this some more. I'm beginning to think that the database model is right and that it's my understanding of how the model will be used that is wrong.

In the meantime any comments or suggestions are welcome!

paulH
  • 1,102
  • 16
  • 43
  • What do you need to present in the report later? You could go with a fine-grained, transactional fact table - every time a person is assigned to or removed from a tenancy, it's a transaction; this will work well with many-to-many relationships. When a person or tenancy changes, you a `move out` transaction linked to the old version followed by an immediate `move in` linked to the new version. – Marek Grzenkowicz Jan 27 '15 at 21:01
  • Well I briefly considered something like that but discarded the idea because I felt that it would make it difficult for users to produce reports. As for what I need to present in the report - I don't know. I am not building a data warehouse to accommodate one report, I am building it to allow business users to query for themselves (probably via a cube) and produce whatever report they want to produce. I thought that was a better way to do things - though it is probably more difficult for me! – paulH Jan 28 '15 at 09:07

1 Answers1

1

Your problem is similar to to the sale transactions with multiple item. The difference, is that a transaction usually has multiple items and your tenancy fact usually has a single person (the tenant).

Your hydra is born because you are trying to model the tenancy as a dimension, when you should be modeling it as a fact.

The reason why I think you have a tenancy dimension, is because somewhere you have a fact rent. To model the fact rent consider use the same approach i stated above, if two persons are tenants of the same property two fact records should be inserted each month: 1) And now comes some magic (that is no magic at all), split the value of the of the rent by the number of tenants and store it the fact 2) store also the full value of the rent (you don't know how the data scientist is going to use the data) 3) check 1) with the business user (i mean people that build the risk models); there might be some advanced rule on how to do the spliting (a similar thing happens when the cost of shipping is to be divided across multiple item lines of the same order -- it might not be uniformly distributed)

SQL.injection
  • 2,607
  • 5
  • 20
  • 37
  • I'm not sure that I agree with your assessment. There are a bunch of attributes of a tenancy that need to be stored - start date, end date, tenancy type etc. that are independent of the people making up the tenancy. There are also many fact tables that need to be related to a tenancy rather than a person - one of them being a rent transaction fact table - and so I need a Tenancy dimension to link them to. – paulH Jan 29 '15 at 11:52
  • start date, end date <- standard date dimension attributes. Well dimensional modeling is not the same as E-R modeling. Regarding the facts that need to be related to the a tenacy I provided you an example how to do it for the fact rent. BTW I added a link, to help you understand better. – SQL.injection Jan 29 '15 at 12:02
  • Do you think that the same sale don't appear over multiple fact tables in a enterprise data warehouse? If you don't embrace the redundancy and denationalization, then you are doing dimensional modeling wrong. – SQL.injection Jan 29 '15 at 12:05
  • I'm quite willing to accept that I am probably doing it wrong, I just need to understand **how** I am doing it wrong. If I take your approach then what would I do with the tenancy start/end dates and tenancy type attribute? Do they get added to the Person dimension? – paulH Jan 29 '15 at 12:48
  • Also, I need to consider other similar scenarios. For example I have a Property dimension with lots of attributes about a property. Re-read my original question and substitute every occurrence of 'Tenancy' with 'Property' and you see it's the exact same problem - lots of people linked to lots of properties over a long time span. Do I also discard the Property dimension? – paulH Jan 29 '15 at 12:50
  • 1
    no, fact_tenency(key_tenency [surogate key], key_person, key_property, key_date_start, key_date_end, tenency_type [dimensional attribute, probably you should create a mini or junk dimension for these tenancy related atributes, rent_value_year [full amount of the yearly rent], rent_value_per_tenant_yearly [rent_value_year / # tenants] , contract_number [contract number or whatever is the equivalent document or id in the operational system]). – SQL.injection Jan 29 '15 at 13:37
  • When doing dimensional modeling, you should aim to easy-of-use for consumption to the end users and for query performance. – SQL.injection Jan 29 '15 at 13:41
  • I've edited my question based on your helpful responses. Thank you. – paulH Jan 29 '15 at 18:02