I'm trying to write a query in timestream where I can get how long a device is powered on, and what mode it was in at the time.
So I have two measures for this: power
and operation_mode
.
What I have so far this the following query:
with active_timeseries AS (
SELECT time,
deviceId,
measure_value::boolean as power,
measure_value::varchar as operationMode,
(LEAD(time, 1, NULL) OVER (ORDER BY time ASC) - time) AS time_interval
FROM "MCH_Telemetry"."Telemetry"
where measure_name IN ('power', 'operation_mode')
AND time between '2022-09-27' and '2022-10-04'
AND deviceId = '0798bbb2-bae6-44c3-9b53-f729fdaa7c6d'
ORDER by time ASC
)
which returns the following data:
time | deviceId | power | operationMode | time_interval |
---|---|---|---|---|
2022-09-27 07:29:06.244000000 | 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d | true | - | 0 00:00:00.000000000 |
2022-09-27 07:29:06.244000000 | 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d | - | Cool | 0 00:33:59.363000000 |
2022-09-27 08:03:05.607000000 | 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d | false | - | 0 00:27:59.517000000 |
2022-09-27 08:31:05.124000000 | 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d | true | - | 0 00:02:59.743000000 |
2022-09-27 08:34:04.867000000 | 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d | - | Dry | 0 00:03:00.794000000 |
2022-09-27 08:37:05.661000000 | 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d | - | Fan | 0 00:02:59.536000000 |
2022-09-27 08:40:05.197000000 | 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d | - | Automatic | 0 00:02:59.801000000 |
Which so far is ok, but proves and issue as there are some missing values.
So for each missing value, I want to use the previous value for that measure so then I can easily group by, and have tried the following:
SELECT
deviceId,time,
if(measure_name = 'power', measure_value::boolean, LAG(measure_value::boolean, 1) OVER (partition by deviceId ORDER BY time ASC)) AS power,
if(measure_name = 'operation_mode', measure_value::varchar) AS mode,
(LEAD(time, 1, NULL) OVER (ORDER BY time ASC) - time) AS time_interval
FROM "MCH_Telemetry"."Telemetry"
where time between '2022-09-27' and '2022-10-04'
AND deviceId = '0798bbb2-bae6-44c3-9b53-f729fdaa7c6d'
and measure_name IN ('power', 'operation_mode')
ORDER BY deviceId
,time ASC
Notice the LAG
function: I'm testing if the value is null and if it is, use the previous value. This works fine if it's only one missing value in a row, but if there are multiple sequentially then it only populates the first one.
I've had a look into interpolation, but don't think that will help as I don't want to bin any data, I want the raw timings as they are recorded and just repeat in the rows.
So any ideas how I can get the data to repeat for missing values so I end up with something like:
time | deviceId | power | operationMode | time_interval |
---|---|---|---|---|
2022-09-27 07:29:06.244000000 | 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d | true | Cool | 0 00:00:00.000000000 |
2022-09-27 07:29:06.244000000 | 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d | true | Cool | 0 00:33:59.363000000 |
2022-09-27 08:03:05.607000000 | 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d | false | Cool | 0 00:27:59.517000000 |
2022-09-27 08:31:05.124000000 | 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d | true | Cool | 0 00:02:59.743000000 |
2022-09-27 08:34:04.867000000 | 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d | true | Dry | 0 00:03:00.794000000 |
2022-09-27 08:37:05.661000000 | 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d | true | Fan | 0 00:02:59.536000000 |
2022-09-27 08:40:05.197000000 | 0798bbb2-bae6-44c3-9b53-f729fdaa7c6d | true | Automatic | 0 00:02:59.801000000 |