3

I want to know what's the best practice for the partition key. In my project, we have a table with event_date, app_id and other columns. The app_id will be growing and could be thousands. The select query is based on event_date and app_id.

The simple data schema is as below:

CREATE TABLE test.test_custom_partition (
    company_id UInt64,
    app_id String,
    event_date DateTime,
    event_name String ) ENGINE MergeTree() 
PARTITION BY (toYYYYMMDD(event_date), app_id) 
ORDER BY (app_id, company_id, event_date) 
SETTINGS index_granularity = 8192;

the select query is like below:

select event_name from test_custom_partition 
where event_date >= '2020-07-01 00:00:00' AND event_date <= '2020-07-15 00:00:00'
AND app_id = 'test';

I want to use (toYYYYMMDD(event_date), app_id) as the partition key, as the query could read the minimal data parts. But it could cause the partitions more than 1000, from the document I see

A merge only works for data parts that have the same value for the partitioning expression. This means you shouldn't make overly granular partitions (more than about a thousand partitions). Otherwise, the SELECT query performs poorly because of an unreasonably large number of files in the file system and open file descriptors.

Or should I use the partition key only toYYYYMMDD(event_date)?

also, could anyone explain why the partition shouldn't more than 1000 partitions? even if the query only use a small set of the data part, it still could cause performance issue?

Thanks

vladimir
  • 13,428
  • 2
  • 44
  • 70
George Jia
  • 659
  • 6
  • 7
  • look at this answer https://stackoverflow.com/a/60155974/303298 – vladimir Jul 27 '20 at 11:30
  • to optimize queries need much more pay attention to the primary key choosing: I would suggest this way: *PARTITION BY toYYYYMM(event_date) ORDER BY (event_date, app_id, company_id)*. Look at this answer: https://stackoverflow.com/a/62464976/303298 – vladimir Jul 27 '20 at 11:36
  • @vladimir thank you for your suggestion. I have decide to use toYYYYMM(event_date) as partition key. – George Jia Jul 31 '20 at 02:59

0 Answers0