0

We have a set of devices and all of them have sensors. All devices have some common set of sensors, but some devices have additional sensors. Every sensor has different discretization level and some sensors could change sometimes very fast, and sometimes could not change for some time. For example, we have DeviceA and have a stream of packets in a form(NULL means that value doesn't change):

Timestamp, Temp, Latitude, Longitude, Speed...
111, 20, 54.111, 23.111, 10
112, 20, NULL, NULL, 13
113, 20, NULL, 23.112, 15

And DeviceB:

Timestamp, Temp, Latitude, Longitude, Speed..., AdditionalSensor
111, 24, 54.111, 23.121, 10 ... 1
112, 23, 55.111, 23.121, 13 ... 2
113, 23, 55.111, 23.122, 15 ... 1

After some time new sensors could be added to some device. Every sensor could be any of numeric types(Int32, UInt8, Float32)

After that data will be used to calculate: dau, mau, retention, GPS coordinates clustering and so on.

We could simply create some table:

CREATE TABLE Sensors
(
        Date Date,
        Id FixedString(16),
        DeviceNumber FixedString(16),
        TimeUtc DateTime,
        DeviceTime DateTime, 
        Version Int32, 
        Mileage Int32, 
        Longitude Float64, 
        Latitude Float64, 
        AccelX Float64, 
        AccelY Float64, 
        AccelZ Float64
        ...
) ENGINE = MergeTree(Date, (DeviceNumber, TimeUtc), 8192);

But two problems here: no support for a different set of sensors and sometimes we have null values for some sensor values in case of no changes and it would be great to see last non null value before a timestamp.

The first problem we could solve by creating a table with fields: SensorName, Timestamp, Date, Value. But how to choose correct type? Should we use different tables for different types? Probably we need to use graphite engine, unfortunately, I have no any experience with that. So any help is really appreciated. It would be great to have possibility to keep only changed values of any sensor.

Update

I found a way how to deal with null values. We could use "anyLast" function to request last received value for a column:

SELECT anyLast(Lights) FROM test where TimeUtc <= toDateTime('2017-11-07 11:13:59');

Unfortunately we can't fill all missing values using some kind of overlapping window functions(no support for them in clickhouse). So in case of nullable field aggregate function will use only not null values and in case of non-nullable field all values including zero values will be used and both ways are incorrect. A workaround is to fill null values before insert using select with anyLast values for all null values in a row.

Hodza
  • 3,118
  • 26
  • 20

1 Answers1

1

You can use Clickhouse like a time-series database.

Your table definition is restricting you from having dynamic metrics. That's why you are trying to deal with NULL values.

You can use this table for sensor values:

CREATE TABLE ts1(
    entity String,
    ts UInt64, -- timestamp, milliseconds from January 1 1970
    s Array(String), -- names of the sensors
    v Array(Float32), -- sensor values
    d Date MATERIALIZED toDate(round(ts/1000)), -- auto generate date from ts column
    dt DateTime MATERIALIZED toDateTime(round(ts/1000)) -- auto generate date time from ts column
) ENGINE = MergeTree(d, entity, 8192)

Here we are loading sensor values of device A:

INSERT INTO ts1(entity, ts, s, v) 
VALUES ('deviceA', 1509232010254, ['temp','lat','long','speed'], [24, 54.111, 23.121, 11])

Querying deviceA temp data:

SELECT 
    entity, 
    dt, 
    ts, 
    v[indexOf(s, 'temp')] AS temp
FROM ts1 
WHERE entity = 'deviceA'

┌─entity─┬──────────────────dt─┬────────────ts─┬─temp─┐
│ deviceA│ 2017-10-28 23:06:50 │ 1509232010254 │  24  │
└────────┴─────────────────────┴───────────────┴──────┘

Check this full answer to get a detailed usage.

ramazan polat
  • 7,111
  • 1
  • 48
  • 76