I'm setting up a TimescaleDB for measurement values. We get measurements that span an interval, so we have a start and end time for measurements. (The interval can be different for different sensors.) We also get updates on measurements and we need to keep track of those. This means the value of a measurement can get corrected and we want to keep both the original and the corrected version.
This results in the following table structure:
sensor_id | start_utc | end_utc | ingested_utc | value | quality |
---|
start_utc
andend_utc
are the start and end of the measured interval.ingested_utc
is the timestamp when we ingested the value. If we get an update of the value, we typically have the samesensor_id
,start_utc
andend_utc
, with a differentingested_utc
.value
is the measured value for the given interval.quality
is an enum that gives info about the quality of the measurement. Things likeinterpolated
andfinal
.
I have created a TimescaleDB hypertable from this table. Now I want to create a continuous aggregate. It needs to adhere to the following rules:
- Rollup values to hourly intervals.
- Sum the values of intervals within that hour.
- If more than one measurement exist, choose the most recently ingested one.
This is the query I've come up with so far:
select
sensor_id,
time_bucket('1 hour', start_utc) as start_utc,
sum(value) as value,
ingested_utc,
quality
from
raw_measurement_data
where
(sensor_id, start_utc, end_utc, ingested_utc) in (
select
sensor_id,
start_utc,
end_utc,
max(ingested_utc)
from
raw_measurement_data
group by
sensor_id, start_utc, end_utc
)
group by
sensor_id, start_utc, ingested_utc, quality
order by sensor_id, start_utc asc
;
This seems to work, at least with my small test data set. However, I'd like to do this more "idiomatic" TimescaleDB. I figure I could used the last()
function from TimescaleDB and that this would perform better on a large dataset.
I've quite some experience with PostgreSQL, but I'm new to TimescaleDB. Any tips on improving this query are more than welcome! The end goal is, of course, to create a continuous aggregate from this query.
As requested in the comments, here's the verbose query plan.