Given some sensor data in a clickhouse table with values every 30 seconds but also some data gaps, I would like to resample the data to 30 seconds and then forward fill the gaps but only if the gaps is shorter than a certain limit.
(similar to method_2 in Python here: https://stackoverflow.com/a/40442467/17080022).
Raw data:
+------------+---------+-------------------+-----+
|equipment_id|sensor_id|datetime |value|
+------------+---------+-------------------+-----+
|123 |456 |2021-02-05 09:18:46|63 |
|123 |456 |2021-02-05 09:19:07|64 |
|123 |456 |2021-02-05 09:18:12|63 |
|123 |456 |2021-02-05 09:22:11|63 |
|123 |456 |2021-02-05 09:22:38|65 |
|123 |456 |2021-02-05 09:24:22|65 |
|123 |456 |2021-02-05 09:24:53|65 |
|123 |456 |2021-02-05 09:24:58|66 |
|123 |456 |2021-02-05 09:26:38|62 |
|123 |456 |2021-02-05 09:27:13|82 |
|123 |456 |2021-02-05 09:27:52|76 |
+------------+---------+-------------------+-----+
Following code already resamples to 30 seconds, but fills all gaps with 0:
SELECT
equipment_id,
sensor_id,
datetime,
value
FROM table_name
WHERE
datetime > '2021-01-05 09:18:00' AND
datetime < '2021-01-05 09:28:00' AND
sensor_id IN (456) AND
equipment_id = 123
GROUP BY equipment_id, sensor_id, toStartOfInterval(datetime, INTERVAL 30 second) AS datetime
ORDER BY datetime WITH FILL STEP 30
Output:
+------------+------------+-------------------+-----+
|equipment_id|sensor_id |datetime |value|
+------------+------------+-------------------+-----+
|123 |456 |2021-01-05 09:18:30|63 |
|123 |456 |2021-01-05 09:19:00|64 |
|0 |0 |2021-01-05 09:19:30|0 |
|0 |0 |2021-01-05 09:20:00|0 |
|0 |0 |2021-01-05 09:20:30|0 |
|0 |0 |2021-01-05 09:21:00|0 |
|0 |0 |2021-01-05 09:21:30|0 |
|123 |456 |2021-01-05 09:22:00|63 |
|123 |456 |2021-01-05 09:22:30|65 |
|0 |0 |2021-01-05 09:23:00|0 |
|0 |0 |2021-01-05 09:23:30|0 |
|123 |456 |2021-01-05 09:24:00|65 |
|123 |456 |2021-01-05 09:24:30|65.5 |
|0 |0 |2021-01-05 09:25:00|0 |
|0 |0 |2021-01-05 09:25:30|0 |
|0 |0 |2021-01-05 09:26:00|0 |
|123 |456 |2021-01-05 09:26:30|62 |
|123 |456 |2021-01-05 09:27:00|82 |
|123 |456 |2021-01-05 09:27:30|76 |
+------------+------------+-------------------+-----+
My desired output with forward filled gaps up to a length of 3 rows (or 90 seconds) would be:
+------------+------------+-------------------+--------------+--------------+
|equipment_id|sensor_id |datetime |original_value|desired_value |
+------------+------------+-------------------+--------------+--------------+
|123 |456 |2021-01-05 09:18:30|63 |63 |
|123 |456 |2021-01-05 09:19:00|64 |64 |
|123 |456 |2021-01-05 09:19:30|0 |null |
|123 |456 |2021-01-05 09:20:00|0 |null |
|123 |456 |2021-01-05 09:20:30|0 |null |
|123 |456 |2021-01-05 09:21:00|0 |null |
|123 |456 |2021-01-05 09:21:30|0 |null |
|123 |456 |2021-01-05 09:22:00|63 |63 |
|123 |456 |2021-01-05 09:22:30|65 |65 |
|123 |456 |2021-01-05 09:23:00|0 |65 |
|123 |456 |2021-01-05 09:23:30|0 |65 |
|123 |456 |2021-01-05 09:24:00|65 |65 |
|123 |456 |2021-01-05 09:24:30|65.5 |65.5 |
|123 |456 |2021-01-05 09:25:00|0 |65.5 |
|123 |456 |2021-01-05 09:25:30|0 |65.5 |
|123 |456 |2021-01-05 09:26:00|0 |65.5 |
|123 |456 |2021-01-05 09:26:30|62 |62 |
|123 |456 |2021-01-05 09:27:00|82 |82 |
|123 |456 |2021-01-05 09:27:30|76 |76 |
+------------+------------+-------------------+--------------+--------------+
Can this be done in clickhouse?