0

I have events table in clickhouse. When some user (defined by user_id) come into the room (defined by object_id) or left the room electonic lock opened by key-card should be opened so for each interaction with the lock there are a record in events table with corresponding event_type set to "came" or "left".

CREATE TABLE default.events
(
 `user_id` UInt32,         -- user of event
 `object_id` UInt32,       -- object of event
 `event_type` String,      -- type of event, can be "came" or "left"
 `event_moment_date` Date, -- date of event
 `event_moment` DateTime   -- datetime of event 
)
ENGINE = MergeTree(
    event_moment_date,
    (
        object_id,
        user_id,
        event_moment
    ),
8192)

I need to output data in form:

user_id,     -- user 
object_id,   -- object
came_moment, -- moment then user entered the room
left_moment  -- moment then user leaved the room

So, for every "came" event I need to find matching "left" event that has equal user_id and object_id and has event_datetime in the nearest possible future relatively to "came" event_moment.

I can easily do this in MySQL/Postgrsql with subquery like this:

SELECT
    s1.object_id, 
    s1.user_id, 
    s1.action_moment as "came_moment", 
    (
        select s2.action_moment from source as s2 
        where 
            s1.user_id = s2.user_id 
        and 
            s1.object_id = s2.object_id
        and 
            s1.action_moment < s2.action_moment
        and
            s2.action_type = 'left'
        order by s2.action_moment asc
        limit 1
    ) as "left_moment"
FROM
source as s1
where s1.action_type = 'came'

But Clickhouse doens't allow to use outer query fields in subquery (dependent subquery).

I've also tried using JOINs but Clickhouse doens't allow to use "<" or ">" in ON statement of the JOIN. Also I've tried to use neighbour function of Clickhouse, with order by event_moment but i've achieved only selection of a single row with specific came_moment and corresponding left_moment by ordering data and selecting next neighbour row, it seems that I cannot load the full list of this kind of rows.

I starting to think that my task is impossible in ClickHouse, and I need some kind of script (php/python/whatever) which would scan the database and add some "session_id" field, which should has equal id number for every two rows which are corresponding to each other "came" and "left".

MihanEntalpo
  • 1,952
  • 2
  • 14
  • 31
  • I'm not familiar with clickhouse, but if some logic seems impossible with mysql, I just use my backend programming language which is Java to perform the operations and manipulation of data for me. So in this instance I would try to have Java retrieve a resultset from mysql server for every time a "left_moment" occurs, then search for the most recent "came_moment" in that resultset which has the same userid. – SenorCardgage Feb 08 '21 at 05:24
  • Using backend programming language of course is a possible solution. But in my case I need the data in a form that it can be retrieved by Graphana, so it must be in database not in the code. But the code could be used to fill some additional field (as I mentioned, `session_id`) and make the task easier but I hope that there are some way to achieve this with existing date by just SQL query – MihanEntalpo Feb 08 '21 at 05:53
  • consider using the array-functions - look at the examples that similar to your task: https://stackoverflow.com/questions/57756729/how-to-extract-ascending-subsets-from-the-sequence, https://stackoverflow.com/questions/61903658/get-the-value-for-the-previous-point-of-time-clickhouse-iot-case – vladimir Feb 08 '21 at 05:55
  • Thanks @vladimir will try. Just have some doubts about speed of such queries :) – MihanEntalpo Feb 08 '21 at 07:51

0 Answers0