I have Clickhouse version: 20.8.7.15 running on Ubuntu 18.04.4 LTS with 56 physical CPU cores (112 logical cores with HT enabled) and 256 GB RAM.
I created a MergeTree table which has hourly reporting data and i am trying to run a query which aggregates metrics for 15 days of data. I tried running the query with different values for --max_threads option provided by clickhouse.
I started with --max_threads=7 and kept doubling the value (7, 14, 28, 56, 112) and tried to run the same query. My observation is that execution time reduces to nearly half when i doubled the --max_threads value until --max_threads=28, after that execution time for query was greater with --max_threads=56 than with --max_threads=28
Someone please explain to me, why this might be happening. Am is missing something ?
Table Definition
CREATE TABLE datawarehouse.AggregatedHourly
(
`vertical_id` UInt8,
`line_item_id` Int32,
`campaign_type` UInt8,
`campaign_id` Int32,
`creative_id` Int32,
`creative_type` Int32,
`enterprise_id` Int32,
`advertiser_id` Int32,
`publisher_id` String,
`exchange_id` Int32,
`manufacturer_id` Int32,
`os_id` Int32,
`os_version_id` Int32,
`app_id` String,
`app_bundle_id` String,
`app_name` String,
`IabCategoryId1` Int32,
`IabCategoryId2` Int32,
`IabCategoryId3` Int32,
`site_domain` String,
`user_country_id` Int32,
`user_carrier_id` Int32,
`is_wifi_targeted` UInt8,
`traffic_type` UInt8,
`zone` UInt8,
`is_ssl` UInt8,
`day` Date,
`hour` UInt8,
`deal_id` Int32,
`connection_type` UInt8,
`billing_type` UInt8,
`creative_orientation` String,
`ltv_event_id` Int32,
`ltv_event_currency` Int32,
`ltv_is_reengagement` UInt8,
`conversion_object_id` Int32,
`session_depth` Int16,
`model_version_id` Int16,
`user_state_id` Int32,
`user_city_id` Int32,
`is_rewarded` UInt8,
`is_unknown_device_id` UInt8,
`model_id` Int32,
`rtb_creative_id` Int32,
`d0_day` Date DEFAULT day,
`d0_hour` UInt8 DEFAULT hour,
`auction_type` Int8,
`adhoc` String,
`requests` Int32,
`bids` Int32,
`wins` Int32,
`unique_wins` Int32,
`impressions` Int32,
`unique_impressions` Int32,
`clicks` Int32,
`unique_clicks` Int32,
`installs` Int32,
`unique_installs` Int32
)
ENGINE = MergeTree
PARTITION BY (day, hour, vertical_id)
ORDER BY (enterprise_id)
SETTINGS index_granularity = 8192
Query
SELECT
multiIf(auction_type = 1, 'First Price', auction_type = 2, 'Second Price', auction_type = 0, 'NA', 'Unknown') AS optimization_auction_type,
SUM(unique_wins) AS result_wins,
SUM(unique_impressions) AS result_impressions,
SUM(unique_clicks) AS result_clicks,
SUM(unique_installs) AS result_mmp_click_installs
FROM datawarehouse.AggregatedHourly
WHERE ((day >= '2020-11-01') AND (day <= '2020-11-15')) AND (enterprise_id IN (10049))
GROUP BY optimization_auction_type
Threads | Execution Time (in seconds) |
---|---|
7 | 52.001 |
14 | 27.205 |
28 | 15.511 |
56 | 16.886 |
112 | 25.444 |
Clickhouse is reading 13.77 billion rows while executing the query.
Clickhouse Trace logs on execution of query with 28 and 56 threads