I am trying to use Clickhouse for a small analytics app of mine and I have a table that records raw hits as:
CREATE TABLE hits (
sessionId LowCardinality(String),
page LowCardinality(String),
timestamp DateTime,
projectId UInt16
) ENGINE = MergeTree() PARTITION BY toYYYYMM(timestamp)
ORDER BY (projectId, page, toStartOfHour(timestamp)) --
SETTINGS index_granularity = 8192;
Afterwards I can add some sample data as:
sessionId page timestamp projectId
xxx / 2021-03-12 13:51:12 1
yyy / 2021-03-12 13:51:12 1
xxx /cool 2021-03-12 13:52:12 1
fff / 2021-03-12 13:53:12 1
What I am trying to achieve is calculating bounces (unique sessionId occurunce) and views per page, something like:
page bounces views projectId
/ 2 3 1
/cool 0 1 1
I can easily count the views per page but the unique sessionId
counting is failing due to the GROUP BY
clause:
SELECT page,
projectId,
count(*) as views,
count(DISTINCT sessionId) as bounces --fail
from hits
GROUP BY (page, projectId);
Any ideas, workarounds on changing the Clickhouse schema or even using some of engine from Clickhouse for aggregation would be highly appreciated.