0

I have a ClickHouse schema as following, MergeTree is in question:

(
hotel String,
staff_member String,
task_number Float64,
date DateTime
)
PRIMARY KEY (hotel, date)
ORDER BY (hotel, date)

My aggregation is as following:

SELECT
    staff_member,
    sum(task_number)
FROM ...
WHERE 
    hotel = {hotel}
AND date >= {first_date}
AND date <= {top_date}
GROUP BY staff_member

Basically, I'm aggregating the number of tasks of a staff member over a period of time, but the aggregation is kind of slow. I have a feeling the primary key is off and I need to rework it. First that comes to mind would be to change the key to (hotel, staff_member, date) since I'm grouping by the staff_member

I'm thankful for any help!

Alek Yo
  • 63
  • 8
  • I would define more monotonic key: *(date, hotel)* (*date* come first). If these aggregates are required regularly then consider using precalculated aggregates - see [AggregatingMergeTree engine](https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/aggregatingmergetree/). (look at https://stackoverflow.com/a/62464976/303298). – vladimir Mar 13 '21 at 23:37
  • @vladimir Why date first, if you could tell me a bit more detail? I need to be able to change the date window on will, so would the AggregatingMergeTree still fit this criteria? – Alek Yo Mar 14 '21 at 20:18
  • If date first then you data will be almost sorted before insertion. But actually `(hotel, date)` is also good if it's common to have `WHERE hotel = {hotel}` in queries. In documentation [there are](https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/mergetree/#primary-keys-and-indexes-in-queries) similar example with `(CounterID, Date)`. – Deffe Mar 16 '21 at 15:58

0 Answers0