Description
System: Postgres 13 + TimescaleDB
I have a timeseries of messages containing error codes generated by devices in intervals of 300 seconds. This timeseries should be aggregated so that consequent error codes (over several continous intervals) by the same device get grouped and the intervall summed up.
Source Format
ts | code | device |
---|---|---|
2023-03-01 12:00:00 | 4 | A |
2023-03-01 12:00:00 | 3 | B |
2023-03-01 12:05:00 | 3 | B |
2023-03-01 12:10:00 | 3 | B |
2023-03-01 12:10:00 | 4 | B |
2023-03-01 14:00:00 | 3 | A |
2023-03-01 14:30:00 | 4 | A |
Target Format
valid_interval | code | device |
---|---|---|
(2023-03-01 12:00:00, 2023-03-01 12:04:59) | 4 | A |
(2023-03-01 12:00:00, 2023-03-01 12:14:59) | 3 | B |
(2023-03-01 12:10:00, 2023-03-01 12:14:59) | 4 | B |
(2023-03-01 14:00:00, 2023-03-01 14:04:59) | 3 | A |
(2023-03-01 14:30:00, 2023-03-01 14:34:59) | 4 | A |
Progress
I have tried using [LAG()/LEAD()](https://www.postgresql.org/docs/13/functions-window.html)
and PARITION BY (code, device)
but i can not get it to work using the condition to only aggregate consequent rows:
SELECT ts,
device,
code,
LEAD(ts) OVER (PARTITION BY device, code ORDER BY ts) as next_ts
FROM source_format
DB-Fiddle
Schema (PostgreSQL v13)
CREATE TABLE timeseries (
ts timestamptz,
code bigint,
device varchar
);
INSERT INTO timeseries VALUES ('2023-03-01 12:00:00', 1, 'A');
INSERT INTO timeseries VALUES ('2023-03-01 12:05:00', 1, 'A');
INSERT INTO timeseries VALUES ('2023-03-01 12:10:00', 1, 'A');
INSERT INTO timeseries VALUES ('2023-03-01 12:10:00', 2, 'A');
INSERT INTO timeseries VALUES ('2023-03-01 12:25:00', 1, 'A');
INSERT INTO timeseries VALUES ('2023-03-01 12:30:00', 1, 'A');
INSERT INTO timeseries VALUES ('2023-03-01 12:00:00', 1, 'B');
INSERT INTO timeseries VALUES ('2023-03-01 12:20:00', 1, 'B');
INSERT INTO timeseries VALUES ('2023-03-01 12:20:00', 3, 'B');
INSERT INTO timeseries VALUES ('2023-03-01 12:25:00', 3, 'B');
Query #1 (Getting Timediff to next message of same device and code)
SELECT ts,
device,
code,
LEAD(ts) OVER (PARTITION BY device, code ORDER BY ts) - ts as diff_to_next_ts
FROM timeseries;
Desired Result
valid_interval | device | code |
---|---|---|
(2023-03-01T12:00:00.000Z, 2023-03-01T12:10:00.000Z) | A | 1 |
(2023-03-01T12:25:00.000Z, 2023-03-01T12:30:00.000Z) | A | 1 |
(2023-03-01T12:10:00.000Z, 2023-03-01T12:10:00.000Z) | A | 2 |
(2023-03-01T12:00:00.000Z, 2023-03-01T12:00:00.000Z) | B | 1 |
(2023-03-01T12:20:00.000Z, 2023-03-01T12:20:00.000Z) | B | 1 |
(2023-03-01T12:20:00.000Z, 2023-03-01T12:25:00.000Z) | B | 3 |
How can i go about including the condition and then "merging" the start and end of consequent code messages into a single row with an interval? Is there a more fitting method to use? Would a pgSQL function be more appropriate?