3

I'm setting up a TimescaleDB for measurement values. We get measurements that span an interval, so we have a start and end time for measurements. (The interval can be different for different sensors.) We also get updates on measurements and we need to keep track of those. This means the value of a measurement can get corrected and we want to keep both the original and the corrected version.

This results in the following table structure:

sensor_id start_utc end_utc ingested_utc value quality
  • start_utc and end_utc are the start and end of the measured interval.
  • ingested_utc is the timestamp when we ingested the value. If we get an update of the value, we typically have the same sensor_id, start_utc and end_utc, with a different ingested_utc.
  • value is the measured value for the given interval.
  • quality is an enum that gives info about the quality of the measurement. Things like interpolated and final.

I have created a TimescaleDB hypertable from this table. Now I want to create a continuous aggregate. It needs to adhere to the following rules:

  • Rollup values to hourly intervals.
  • Sum the values of intervals within that hour.
  • If more than one measurement exist, choose the most recently ingested one.

This is the query I've come up with so far:

select
    sensor_id,
    time_bucket('1 hour', start_utc) as start_utc,
    sum(value) as value,
    ingested_utc,
    quality
from
    raw_measurement_data
where
    (sensor_id, start_utc, end_utc, ingested_utc) in (
        select
            sensor_id,
            start_utc,
            end_utc,
            max(ingested_utc)
        from
            raw_measurement_data
        group by
            sensor_id, start_utc, end_utc
        )
group by
    sensor_id, start_utc, ingested_utc, quality
order by sensor_id, start_utc asc
;

This seems to work, at least with my small test data set. However, I'd like to do this more "idiomatic" TimescaleDB. I figure I could used the last() function from TimescaleDB and that this would perform better on a large dataset.

I've quite some experience with PostgreSQL, but I'm new to TimescaleDB. Any tips on improving this query are more than welcome! The end goal is, of course, to create a continuous aggregate from this query.


As requested in the comments, here's the verbose query plan.

Bart Kummel
  • 662
  • 12
  • 18

0 Answers0