0

The base table

CREATE TABLE IF NOT EXISTS test_sessions
(
    session_id   UInt64,
    session_name String,
    created_at   DateTime
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(created_at)
ORDER BY (session_id);

With the following data

INSERT INTO test_sessions (session_id, session_name, created_at) VALUES
(1, 'start', '2021-01-31 00:00:00'),
(1, 'stop', '2021-01-31 01:00:00'),
(2, 'start', '2021-01-31 01:00:00')
;

Created 2 materialized views to get closed sessions

CREATE MATERIALIZED VIEW IF NOT EXISTS test_session_aggregate_states
(
    session_id UInt64,
    started_at AggregateFunction(minIf, DateTime, UInt8),
    stopped_at AggregateFunction(maxIf, DateTime, UInt8)
)
ENGINE = AggregatingMergeTree
PARTITION BY tuple()
ORDER BY (session_id)
POPULATE AS
SELECT session_id,
       minIfState(created_at, session_name = 'start') AS started_at,
       maxIfState(created_at, session_name = 'stop')  AS stopped_at
FROM test_sessions
GROUP BY session_id;

CREATE VIEW IF NOT EXISTS test_session_completed
(
    session_id UInt64,
    started_at DateTime,
    stopped_at DateTime
)
AS
SELECT session_id,
       minIfMerge(started_at) AS started_at,
       maxIfMerge(stopped_at) AS stopped_at
FROM test_session_aggregate_states
GROUP BY session_id
HAVING (started_at != '0000-00-00 00:00:00') AND
       (stopped_at != '0000-00-00 00:00:00')
;

It works normally: return 1 row with existing "start" and "stop"

SELECT * FROM test_session_completed;
-- 1,2021-01-31 00:00:00,2021-01-31 01:00:00

Trying to create a materialized view based on test_session_completed with joins to other tables (there are no joins in the example)

CREATE MATERIALIZED VIEW IF NOT EXISTS test_mv
(
    session_id UInt64
)
ENGINE = MergeTree
PARTITION BY tuple()
ORDER BY (session_id)
POPULATE AS
SELECT session_id
FROM test_session_completed
;

Writing a test queries to test the test_mv

INSERT INTO test_sessions (session_id, session_name, created_at) VALUES
(3, 'start', '2021-01-31 02:00:00'),
(3, 'stop', '2021-01-31 03:00:00');

SELECT * FROM test_session_completed;
-- SUCCESS
-- 3,2021-01-31 02:00:00,2021-01-31 03:00:00
-- 1,2021-01-31 00:00:00,2021-01-31 01:00:00

SELECT * FROM test_mv;
-- FAILURE
-- 1
-- EXPECTED RESULT
-- 3
-- 1

How to fill test_mv based on test_session_completed ?

ClickHouse version: 20.11.4.13

cetver
  • 11,279
  • 5
  • 36
  • 56

1 Answers1

2
  1. Impossible to create MV over view.
  2. MV is an insert trigger and it's impossible to get state completed without having state started in the same table. If you don't need to check that started happen before completed then you can make simpler MV and just check where completed.
  3. You don't need minIfState you can use min (SimpleAggregateFunction). It will reduce stored data and will improve performance.
  4. I think the second MV is excessive.

Check this: https://den-crane.github.io/Everything_you_should_know_about_materialized_views_commented.pdf

https://youtu.be/ckChUkC3Pns?list=PLO3lfQbpDVI-hyw4MyqxEk3rDHw95SzxJ&t=9371


I would do this:

CREATE TABLE IF NOT EXISTS test_sessions
(
    session_id   UInt64,
    session_name String,
    created_at   DateTime
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(created_at)
ORDER BY (session_id);


CREATE MATERIALIZED VIEW IF NOT EXISTS test_session_aggregate_states
(
    session_id UInt64,
    started_at SimpleAggregateFunction(min, DateTime),
    stopped_at SimpleAggregateFunction(max, DateTime)
)
ENGINE = AggregatingMergeTree
PARTITION BY tuple()
ORDER BY (session_id)
POPULATE AS
SELECT session_id,
       minIf(created_at, session_name = 'start') AS started_at,
       maxIf(created_at, session_name = 'stop')  AS stopped_at
FROM test_sessions
GROUP BY session_id;

INSERT INTO test_sessions (session_id, session_name, created_at) VALUES
(3, 'start', '2021-01-31 02:00:00'),
(3, 'stop', '2021-01-31 03:00:00');


completed sessions:

SELECT session_id,
       min(started_at) AS started_at,
       max(stopped_at) AS stopped_at
FROM test_session_aggregate_states
GROUP BY session_id
HAVING (started_at != '0000-00-00 00:00:00') AND
       (stopped_at != '0000-00-00 00:00:00');

┌─session_id─┬──────────started_at─┬──────────stopped_at─┐
│          1 │ 2021-01-31 00:00:00 │ 2021-01-31 01:00:00 │
└────────────┴─────────────────────┴─────────────────────┘

And using argMaxState you can aggregate multiple start stop within one session_id

Denny Crane
  • 11,574
  • 2
  • 19
  • 30