0

In clickhouse, I want to do a query operation. The query contains group by QJTD1, but QJTD1 is obtained by querying the dictionary. The statement is as follows:

`SELECT
IF(
    sale_mode = 'owner',
    dictGetString(
        'dict.dict_sku',
        'dept_id_1',
        toUInt64OrZero(sku_id)
    ),
    dictGetString(
        'dict.dict_shop',
        'dept_id_1',
        toUInt64OrZero(shop_id)
    )
) AS QJTD1,
brand_cd,
coalesce(
    uniq(sd_deal_ord_user_num),
    0
) AS sd_deal_ord_user_num,
0 AS item_uv,
dt
FROM app.test_all
WHERE dt >= '2020-11-01'
AND dt <= '2020-11-30'
and IF(
    sale_mode = 'owner',
    dictGetString(
        'dict.dict_sku',
        'bu_id',
        toUInt64OrZero(sku_id)
    ),
    dictGetString(
        'dict.dict_shop',
        'bu_id',
        toUInt64OrZero(shop_id)
    )
)= '1727' GROUP BY
QJTD1,
brand_cd,
dt
ORDER BY item_pv desc limit 0,
100`

, QJTD1 has serious data skew, resulting in slow query speed. I have tried to optimize the index to improve the query speed. The index is as follows: sku_id,shop_id....but it has no effect. How can I improve the query efficiency?

moneyleaf
  • 11
  • 5

1 Answers1

0

CH calculates both branches of IF (then & else) always.

You can use two-stage group by

select IF( sale_mode ='owner', ... as QJTD1
from (  
  select owner, sku_id, dept_id_1, ....
  ...
  group by owner, sku_id, dept_id_1
  )
group by QJTD1

Or define dictionary <injective>true

https://clickhouse.tech/docs/en/sql-reference/dictionaries/external-dictionaries/external-dicts-dict-structure/

Flag that shows whether the id -> attribute image is injective.
If true, ClickHouse can automatically place after the GROUP BY 
clause the requests to dictionaries with injection. Usually it 
significantly reduces the amount of such requests.

Default value: false.

If they are injective.

And I would test Union all then to calculate IF branches only one time.

Denny Crane
  • 11,574
  • 2
  • 19
  • 30
  • Thank you for your answer, but the data granularity of sku_id and shop_id is more granular, I tried your method, the running time of group by sku_id and sku_id will be longer.I will show you the complete code as follow. – moneyleaf Jan 26 '21 at 03:19