What about using a fix schema in ClickHouse that would support any type in your events. For example using Nested columns or JSON columns. I tried this for a PoC using Nested columns with good results. First I identified 4 different data types: strings
, numbers
, booleans
and dates
.
The ClickHouse schema would look like this
CREATE TABLE custom_events
(
// ... Normal columns needed depending on your use case
attribute_string Nested
(
name String,
value String
),
attribute_number Nested
(
name String,
value Float32
),
attribute_boolean Nested
(
name String,
value UInt8
),
attribute_date Nested
(
name String,
value Datetime('UTC')
)
)
ENGINE = MergeTree()
ORDER BY (
...
);
The Nested columns will be converted into two columns of Array type by ClickHouse. To use this schema you would need to group the events fields by type and flatten them before proceeding to insert into ClickHouse. For example the following event in JSON
{
"event_id": 1,
"event_date": "2018-08-02 10:06:23",
"data": {
"rides": [
{
"km": 93,
"vehicle": "plane",
"people": 2.15,
"finished": true,
"date": "2019-06-24T18:52:34"
}
]
}
}
Could be flattened like this
{
"event_id": 1,
"event_date": "2018-08-02 10:06:23",
"data.rides.0.km": 93,
"data.rides.0.vehicle": "plane",
"data.rides.0.people": 2.15,
"data.rides.0.finished": true,
"data.rides.0.date": "2019-06-24T18:52:34"
}
And then you can insert into ClickHouse like so
insert into custom_events VALUES (1, "2018-08-02 10:06:23",
['data.rides.0.vehicle'], ['plane'], // This is Key and Value for strings
['data.rides.0.km', 'data.rides.0.km'], [93, 2.15] // This is Key and Value for numbers
... // do the same for each data type
)
Then you can query the data leveraging all the great high order functions provided by ClickHouse
SELECT
arrayFilter(x -> ((x.1) = 1), arrayMap((x, y) -> (x LIKE 'data.rides.%.km', y), attribute_number.name, attribute_number.value)).2 AS element,
event_date
FROM custom_events
WHERE (event_id = 1) AND notEmpty(element)
The query doesn't make sense as I redacted the fields a bit, but you can get the idea. This way you don't need to modify your schema and you can store any arbitrary JSON schema with the same fix schema in ClickHouse.