0

I have a table in Clickhouse with the structure:

x_id | y_id | z_id | rank | timestamp
1231 | 1324 | 9412 | 1    | 2021-03-12 00:13:34
121  | 5524 | 765  | 21   | 2021-03-13 15:43:21
54   | 76   | 8822 | 125  | 2021-05-14 17:23:12
213  | 61   | 7651 | 51   | 2021-03-16 12:15:43
53   | 65   | 123  | 23   | 2021-03-12 13:28:54
1231 | 432  | 7651 | 1541 | 2021-03-12 16:54:24
...

There are no records for particular groups (x_id, y_id, z_id) for some weeks and in this case I need to take the previous rank (from the previous week) for this group (x_id, y_id, z_id) if a value exists.

For example:

group_ids, rank, timestamp
(1, 1, 1, 25, '2021-03-12 00:13:34') -> group (1, 1, 1), week 2021-03-08
(2, 2, 2, 30, '2021-03-16 00:13:34') -> group (2, 2, 2), week 2021-03-15

no data for group (1, 1, 1) for week 2021-03-15 - fill from the previous week and set "week" as the current week:

(1, 1, 1, 25, 2021-03-15)

and so on ...

and than count metrics for this data using subquery

SELECT
    week,
    SUM(CASE
        WHEN rank BETWEEN 1 AND 3 THEN 1
        ELSE 0
    END) AS metric1,
    /* ... */
FROM (
    SELECT min(rank) AS rank, toStartOfWeek(Timestamp, 1) AS week FROM table GROUP BY week, x_id, y_id, z_id
) GROUP BY week ORDER BY week;


metric1 | metric2 |  week
0       |  2      |  2021-03-22
1       |  0      |  2021-03-29 
0       |  1      |  2021-04-05

Is it possible to build a query with forward filling missing values?

Andrii
  • 33
  • 1
  • 5

2 Answers2

2

You can use WITH FILL modifier: https://clickhouse.com/docs/en/sql-reference/statements/select/order-by/#orderby-with-fill

I guess you have this part in your query:

ORDER BY week

Just expand it to

ORDER BY week WITH FILL STEP 7
Andrei Koch
  • 898
  • 1
  • 7
  • 23
1

I think it is better to do gaps filling on the server-side (in code of app consumed this result).

Nevertheless, on ClickHouse side consider this way:

SELECT 
    week, 
    last_value(metric1) OVER w AS metric1
    /*, ..*/
FROM (
    SELECT 
        toStartOfWeek(timestamp) week, 
        toNullable(minIf(rank, rank < 100)) metric1 
        /*, .. */
    FROM (
        /* Emulate the test dataset. */
        SELECT data.1 x_id, data.2 y_id, data.3 z_id, data.4 rank, toDateTime(data.5) timestamp
        FROM (
            SELECT arrayJoin([
                (1231, 1324, 9412, 1   , '2021-03-12 00:13:34'),
                (121 , 5524, 765 , 21  , '2021-03-13 15:43:21'),
                (54  , 76  , 8822, 125 , '2021-05-14 17:23:12'),
                (213 , 61  , 7651, 51  , '2021-03-16 12:15:43'),
                (53  , 65  , 123 , 23  , '2021-03-12 13:28:54'),
                (1231, 432 , 7651, 1541, '2021-03-12 16:54:24')]) data
            )
        )        
    GROUP BY week
    ORDER BY week WITH FILL STEP 7
)
WINDOW w AS (ORDER BY week ROWS BETWEEN 100 PRECEDING AND CURRENT ROW)
SETTINGS allow_experimental_window_functions = 1

/*
┌───────week─┬─metric1─┐
│ 2021-03-07 │       1 │
│ 2021-03-14 │      51 │
│ 2021-03-21 │      51 │
│ 2021-03-28 │      51 │
│ 2021-04-04 │      51 │
│ 2021-04-11 │      51 │
│ 2021-04-18 │      51 │
│ 2021-04-25 │      51 │
│ 2021-05-02 │      51 │
│ 2021-05-09 │       0 │
└────────────┴─────────┘
*/

Remarks:

  • metric value is marked as Nullable (see toNullable call) to fill missed values by NULL not 0
  • the window size is defined as 100 - it is enough when the gap is less than 100 weeks (increase this value if required)

See for more info:

vladimir
  • 13,428
  • 2
  • 44
  • 70
  • Thanks for the answer! I've added examples to my question. The main problem that I need to group by week and multiple fields (week, x_id, y, id_, z_id) and create gaps and fill them from previous week for each group of fields if there are no values for this group every next week. I would like to solve this issue using Clickhouse, but if it isn't possible, I'll fill gaps on the server-side. – Andrii Nov 16 '21 at 12:01