0

I am trying to use Clickhouse for a small analytics app of mine and I have a table that records raw hits as:

CREATE TABLE hits (
  sessionId LowCardinality(String),
  page LowCardinality(String),
  timestamp DateTime,
  projectId UInt16
) ENGINE = MergeTree() PARTITION BY toYYYYMM(timestamp)
ORDER BY (projectId, page, toStartOfHour(timestamp)) --
  SETTINGS index_granularity = 8192;

Afterwards I can add some sample data as:

sessionId page    timestamp            projectId 
xxx       /       2021-03-12 13:51:12  1         
yyy       /       2021-03-12 13:51:12  1         
xxx       /cool   2021-03-12 13:52:12  1         
fff       /       2021-03-12 13:53:12  1                 

What I am trying to achieve is calculating bounces (unique sessionId occurunce) and views per page, something like:

page   bounces views projectId
/      2       3     1
/cool  0       1     1

I can easily count the views per page but the unique sessionId counting is failing due to the GROUP BY clause:

SELECT page,
  projectId,
  count(*) as views,
  count(DISTINCT sessionId) as bounces --fail
from hits
GROUP BY (page, projectId);

Any ideas, workarounds on changing the Clickhouse schema or even using some of engine from Clickhouse for aggregation would be highly appreciated.

vorillaz
  • 6,098
  • 2
  • 30
  • 46
  • Your table hits is incorrect. Why MergeTree has AggregateFunction? MergeTree wont collapse AggregateFunction. – Denny Crane Mar 09 '21 at 14:52
  • What is the test.fooagr ? – Denny Crane Mar 09 '21 at 14:52
  • use `uniqExact(sessionId)` or better `uniq(sessionId)` instead of `count(DISTINCT sessionId)` – Denny Crane Mar 09 '21 at 14:54
  • Hey @DennyCrane, `AggregateFunction` and `test.fooagr` where typos due to bad copy-pasting and trying to simplify the snippets. Using `uniq(sessionId)` or `count(DISTINCT sessionId)` does not count bounces as the `sessionId` occurs in another group since I want found out which pages have the most bounces. – vorillaz Mar 09 '21 at 16:45

1 Answers1

1

check https://clickhouse.tech/docs/en/sql-reference/aggregate-functions/parametric-functions/#function-sequencecount

select projectId, p[1] page, countIf(length(p)=1) bounce
from (
SELECT 
  projectId, sessionId,
  groupArray( page ) p
from hits
GROUP BY sessionId, projectId )
group by projectId,page

Denny Crane
  • 11,574
  • 2
  • 19
  • 30
  • Hey there, thanks a lot for the answer, it actually works like a charm. I took a look at `sequenceCount` but I am not quite sure how it can be used with parametric values. Would you mind updating the answer with an example? – vorillaz Mar 10 '21 at 10:35