How can I get distribution of filed's value in influx db

Question

I use influx db to save my website's performance data。The data in my influx db like below:

//mock data
time                       load
----                       ----
2018-11-27T08:34:46.899z   262
2018-11-27T08:35:46.899z   300
2018-11-27T08:36:46.899z   102
2018-11-27T08:37:46.899z   1000
2018-11-27T08:38:46.899z   800
2018-11-27T08:39:46.899z   6000
2018-11-27T08:40:46.899z   326
2018-11-27T08:41:46.899z   500
2018-11-27T08:42:46.899z   90
2018-11-27T08:43:46.899z   902

My purpose is to get the distribution of the load value. For example, I want to get the load time's distribution in the form like below:

load      frequent
----      ----
0~100     1  
100~200   1  
200~300   2 
300~400   2
.
.
.
5000~6000 1

I try to resolve this problem in two ways, but I thought it not the best way to get this result.

(一) Query the whole load time, and then in my nodejs server to calculate this result. I think this solution will meet performance problem when I want to get one years or more time duration's distribution.

(二) At the same time of saving my raw data into influx db , I calculate the range of load time then save it into another influx db table, calculate table like below:

// load distribute table
time       100~200  200~300  300~400  ....  5000~6000
----        ----     ----     ----           ----
08:34:46     1
08:34:47              1
08:34:48     1
08:34:49                       1
08:34:50                       1
  

// then in the future I can get the distribution like this:
select count(*) from load_distribute

But I think this solution may also have disadvantage such like waste the space of host machine, as you know this table is so sparse.

score 0 · Answer 1 · answered Dec 04 '18 at 16:36

I'm not aware of the way to do a range grouping in the InfluxDB alone (other than by time interval).

Moreover, even relational DBs either rely on SQL extensions like CASE, or subquery joins there - though Influx is none of a relational one.

Staying within TICK stack, you can do it with Kapacitor processing, though, but here's the thing: it a TIMESERIES database. The timing is the key, and I don't see where the time factor is involved there...

So, perhaps, the best solution would be to do a series of separate queries that's gonna count the occurrences of values in each range, and then put it together in your app.

Or join the same things in the Kapacitor by time intervals (like, batch it every 10 min), settle it into one measurement, where the value names would represent load ranges and the count would be the value (load0_99=3,load100_199=24, etc) - and then you'd be able to do it in one query.

How can I get distribution of filed's value in influx db

1 Answers1