Star schema many-to-many for time series data

Question

I have a DataWarehouse / star schema modelling question.

I have meter readings (facts) for different locations (dimensions). Each location has multiple organisations in different roles. I would like to model this in a data-warehouse schema. Current plan is:

fact_meter_readings
   timestamp,
   location_id,
   meter_reading (number)

dim_locations
   location_id
   name

dim_location_organisations
   location_id
   organisation_id
   role_id
   organisation_name
   role_name

Unique key (location_id, organisation_id, role_id)

The last table is the one in question. Although the table has a unique constraint, the unique constraint is not shared to the fact table, which is the bit I'm unclear about.

I am wanting to use this schema to answer questions like sum the meter_reading grouped by organisation where they are in the role "operator"

e.g.

select 
  lo.organisation_name,
  sum(meter_reading) 
from fact_meter_readings m
inner join dim_location_organisations lo
on lo.location_id = m.location_id
where lo.role_id = xyz
group by lo.organisation_name

Is there any issue here as the dimension table does not have a unique key that is shared with the fact table? Obviously queries would need to handled carefully to avoid double counting facts.

score 0 · Answer 1 · answered Apr 09 '20 at 11:01

0

I would create a new table called organization, as location and organization are two different concepts. Then you can query organization through the location table:

fact_meter_readings
   timestamp,
   location_id,
   meter_reading (number)

dim_locations
   location_id
   name

dim_location_organisations
   location_id
   role_id
   role_name

dim_organization
   organization_id
   location_id
   organisation_name

And probably "roles" also deserve their own dimension. The query will be a bit longer but I believe its clearer to give each concept its own dimension and will pay off in the future

answered Apr 09 '20 at 11:01

Matias Thayer

571
3
8

I prefer to keep the dimensions as flat as possible. In the companion relational system these concepts are highly normalised, the dimensions here are synced. – Paul Grimshaw Apr 09 '20 at 11:53
1

mm, OK, in that case I don't see a big problem. But it may lead to confusion in the future and people doing double countings inadvertently. A sugestion: If you don't have storage issues, you may find it useful, to normalise dims as I suggestd, and then creating a big table so no one has to invest time in joins. Some sort of materialised view with the result of your query. We used to do that in a couple of past jobs, and it was quite efficient – Matias Thayer Apr 09 '20 at 14:07
Yes the double counting is the risk, which is why i'm curious about ideas of a different structure. From what i understand, your structure above would not prevent double counting but just add an additional join to the query – Paul Grimshaw Apr 13 '20 at 09:30

Star schema many-to-many for time series data

1 Answers1