3

I've been doing some reading on architecture types for OLAP processing, specifically the Star Schema concept. I'm currently setting up a database for testing.

My Situation

I have 750 sensors, each will be posting data to an SQL database every minute. Currently, my approach is to post if there is a change in the value. Thinking about this a little further makes me wonder if it could pose an issue in determining if there is data loss vs an unchanged value and makes me re-consider scheduled posting.

The software which inserts the sensor's value posts a time stamp in the format 2020-01-23 13:48:52. Along with this, the sensor name and sensor value are also posted. So, 4 columns in my fact table if you count the primary key.

I know I'll need to do some intermediary processing on my data (probably a table that is processed on a schedule to get it to conform to the schema, then dropped), but for now I'm trying to keep this high-level and tackle one issue at a time.

Here is what I kind of have envisioned currently, but I'm lost on how to handle the Time Dimension considering the resolution I need.

Star Schema Mock-up

My Questions

"Currently, my approach is to post if there is a change in the value. Thinking about this a little further makes me wonder if it could pose an issue in determining if there is data loss vs an unchanged value and makes me re-consider scheduled posting."

Is this a valid concern to identify data loss or is there another way to address it?

Considering my needed resolution, what would a sample time dimension look like?

Ajility
  • 526
  • 3
  • 19

1 Answers1

5

If I understand your situation correctly, you are recording values that represent some system status at a point of time. In other words, you are capturing the system's snapshot in time. In a star schema, you should then use a "periodic snapshot fact table". Such fact tables capture values regardless if they have changed or not, because their grain is a unit of date/time, not a transaction.

Time dimension can be modeled in several ways. I would do the following:

Create dimension "Date" to deal with calendar dates. Grain: one calendar day. Span: from the day you have data, until today including.

Create dimension "Time" to deal with time of the day. Grain: one minute. Span: 24 hours (so total records in this table are 60 min * 24 hours).

Add Date and Time keys to the fact table.

I would also keep time stamp as a degenerate dimension in the fact table, to have access to seconds if needed.

RADO
  • 7,733
  • 3
  • 19
  • 33