1

I'm trying to write a query in timestream where I can get how long a device is powered on, and what mode it was in at the time.

So I have two measures for this: power and operation_mode.

What I have so far this the following query:

with active_timeseries AS (
  SELECT time,
  deviceId,     
    measure_value::boolean as power,
    measure_value::varchar as operationMode,
    (LEAD(time, 1, NULL) OVER (ORDER BY time ASC) - time) AS time_interval
  FROM "MCH_Telemetry"."Telemetry"
  where measure_name IN ('power', 'operation_mode')
    AND time between '2022-09-27' and '2022-10-04'
AND deviceId = '0798bbb2-bae6-44c3-9b53-f729fdaa7c6d'
  ORDER by time ASC
)

which returns the following data:

time deviceId power operationMode time_interval
2022-09-27 07:29:06.244000000 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d true - 0 00:00:00.000000000
2022-09-27 07:29:06.244000000 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d - Cool 0 00:33:59.363000000
2022-09-27 08:03:05.607000000 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d false - 0 00:27:59.517000000
2022-09-27 08:31:05.124000000 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d true - 0 00:02:59.743000000
2022-09-27 08:34:04.867000000 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d - Dry 0 00:03:00.794000000
2022-09-27 08:37:05.661000000 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d - Fan 0 00:02:59.536000000
2022-09-27 08:40:05.197000000 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d - Automatic 0 00:02:59.801000000

Which so far is ok, but proves and issue as there are some missing values.

So for each missing value, I want to use the previous value for that measure so then I can easily group by, and have tried the following:

SELECT
            deviceId,time,
if(measure_name = 'power', measure_value::boolean, LAG(measure_value::boolean, 1) OVER (partition by deviceId ORDER BY time ASC)) AS power,
if(measure_name = 'operation_mode', measure_value::varchar) AS mode,
(LEAD(time, 1, NULL) OVER (ORDER BY time ASC) - time) AS time_interval
FROM  "MCH_Telemetry"."Telemetry"
        where time between '2022-09-27' and '2022-10-04'
AND deviceId = '0798bbb2-bae6-44c3-9b53-f729fdaa7c6d'
and measure_name IN ('power', 'operation_mode')
ORDER BY deviceId
    ,time ASC

Notice the LAG function: I'm testing if the value is null and if it is, use the previous value. This works fine if it's only one missing value in a row, but if there are multiple sequentially then it only populates the first one.

I've had a look into interpolation, but don't think that will help as I don't want to bin any data, I want the raw timings as they are recorded and just repeat in the rows.

So any ideas how I can get the data to repeat for missing values so I end up with something like:

time deviceId power operationMode time_interval
2022-09-27 07:29:06.244000000 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d true Cool 0 00:00:00.000000000
2022-09-27 07:29:06.244000000 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d true Cool 0 00:33:59.363000000
2022-09-27 08:03:05.607000000 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d false Cool 0 00:27:59.517000000
2022-09-27 08:31:05.124000000 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d true Cool 0 00:02:59.743000000
2022-09-27 08:34:04.867000000 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d true Dry 0 00:03:00.794000000
2022-09-27 08:37:05.661000000 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d true Fan 0 00:02:59.536000000
2022-09-27 08:40:05.197000000 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d true Automatic 0 00:02:59.801000000
ADringer
  • 2,614
  • 36
  • 63

0 Answers0