Handling multiple grains within a star schema

Question

I'm trying to model a business process that is inherently measured at multiple grains. Usually, this would necessitate one fact table per grain. Because this a single business process and only one of the dimensions is at a mixed grain (for some records) I'm not sure a separate fact table makes the most sense.

The process itself is based on measuring a Research Application. Each application can have applicants, funders, collaborators, and so forth. Additionally, each application can be managed by an organisation. For the M:N relationships I'm using bridge tables and weighting factors. The problem lies with the organisation dimension, which models a slightly ragged hierarchy as fixed depth attributes.

dim_organisation
id, organisation, faculty, school, division, unit

Each fact record has the same dimensionality with the exception of this dimension. Sometimes the application is managed by a faculty (level 2 in the hierarchy), and sometimes by a school (level 3 in the hierarchy). Furthermore, the fact record itself will only contain the business key for one of those levels e.g. school_code or faculty_code.

Here's how I believe the problem can and should be solved but I'd like some validation of this approach and / or some better proposals if necessary:

The initial dim_organisation table is populated via an external, master data source. The data is always balanced, i.e., there's no missing levels in the data, but it's ragged, so that some entries end at school, whereas others go right down to the unit level:

id, organisation, faculty, school, division, unit
1, org A, faculty A, school A, NULL, NULL
2. org A, faculty B, school B, division B, unit B
3. org A, faculty C, NULL, NULL, NULL

Because these records are at different grains I've copied down the last non-NULL level to complete the hierarchy:

id, organisation, faculty, school, division, unit
1, org A, faculty A, school A, school A, school A
2. org A, faculty B, school B, division B, unit B
3. org A, faculty C, faculty C, faculty C, faculty C

This ensures that every record in the org_dimension is at the same grain and is a standard approach for handling slightly ragged hierarchies. In addition, each of these levels has their own code e.g. L456 for a level 4 division or L521 for a level 5 unit. These are the business keys obtained from the source system.

Therefore, I can only refer to a single record in the dimension by combining all of the level codes accordingly. At the moment I'm creating a hash key on these level codes and storing the value in a lookup column on the dimension.

Assuming this approach is correct, I then have fact records coming in as follows:

application_id, organisation_id, applicant_id, ...
1, L456, 99
2, L321, 50
3, L549, 20

As you can see, the application fact is linked to my organisation dimension at different grains e.g. Level 4, Level 3, Level 5 and so forth. Because of the changes I've made to the dimension, I believe I now need to do the following:

1. Lookup the level code from dim_organisation.
2. Return the parent levels.
3. Copy down the level value associated with the fact to level 5.
4. Hash the keys and lookup the corresponding dimensional record.

For example:

1. Lookup L456 to return Division e.g. "Research and Engineering".
2. Return parents: "UoM" -> "Faculty of R&D" -> "School of Engineering".
3. Copy levels: L1 -> L2 -> L3 -> "Research and Engineering" (L4) -> "Research Engineering" (L5).
4. Now we have all the levels (parents + cascaded) to give us a unique record to look up in dim_organisation.

I'd like to know if this approach makes sense or if there is a better and more intuitive way of doing this? It's slightly messy because of the source data that I'm dealing with but that's the data reality I have to work with.

score 1 · Answer 1 · answered Jun 18 '19 at 13:46

You've done good with your dimension by pushing the ragged hierarchy down to the lowest grain. Now have your fact record reference the unique row indicator for the dimension.

1, org A, faculty A, school A, school A, school A
2. org A, faculty B, school B, division B, unit B
3. org A, faculty C, faculty C, faculty C, faculty C

If the fact event is related to school A, the fact would store row id #1.

The only caveat to this approach is that the real level of the dim should be identifiable by the content. In other words, if School A is West Side High School and faculty C is Mr West, you wouldn't want them both described as "WEST". If the content of each level is fully descriptive then this model will work just fine.

I have used this exact same approach to model an organizational hierarchy containing up to 10 levels of reporting.

Handling multiple grains within a star schema

1 Answers1