1

Using InfluxDB: Is there any way to build a time-bucketed report of a field value representing a state that persists over time? Ideally in InfluxQL query language

More specifically as an example: Say a measurement contains points that report changes in the light bulb state (On / Off). They could be 0s and 1s as in the example below, or any other value. For example:

time                   light
----                   -----
2022-03-18T00:00:00Z   1
2022-03-18T01:05:00Z   0
2022-03-18T01:55:00Z   0
2022-03-18T02:30:00Z   1
2022-03-18T04:06:00Z   0

The result should be a listing of intervals indicating if this light was on or off during each time interval (e.g. hours), or what percentage of that time it was on. For the given example, the result if grouping hourly should be:

Hour Value
2022-03-18 00:00 1.00
2022-03-18 01:00 0.17
2022-03-18 02:00 0.50
2022-03-18 03:00 1.00
2022-03-18 04:00 0.10

Note that:

  • for 1am bucket, even if the light starts and ends in On state, it was On for only 10 over 60 minutes, so the value is low (10/60)
  • and more importantly the bucket from 3am to 4am has value "1" as the light was On since the last period, even if there was no change in this time period. This rules out usage of simple aggregation (e.g. MEAN) over a GROUP BY TIME(), as there would not be any way to know if an empty/missing bucket corresponds to an On or Off state as it only depends on the last reported value before that time bucket.

Is there a way to implement it in pure InfluxQL, without retrieving potentially big data sets (points) and iterating through them in a client?

herchu
  • 936
  • 7
  • 24

1 Answers1

1

I consider that raw data could be obtained by query:

SELECT "light" FROM "test3" WHERE $timeFilter

Where "test3" is your measurement name and $timeFilter is from... to... time period. In this case we need to use a subquery which will fill our data, let's consider grouping (resolution) time as 1s:

SELECT last("light") as "filled_light" FROM "test3" WHERE $timeFilter GROUP BY time(1s) fill(previous)

This query gives us 1/0 value every 1s. We will use it as a subquery.

NOTE: You should be informed that this way does not consider if beginning of data period within $timeFilter has been started with light on or off. This way will not provide any data before hour with any value within $timeFilter.

In next step you should use integral() function on data you got from subquery, like this:

SELECT integral("filled_light",1h) from (SELECT last("light") as "filled_light" FROM "test3" WHERE $timeFilter GROUP BY time(1s) fill(previous)) group by time(1h)

This is how it looks on charts:

This is an example created in Grafana connected to InfluxDB - charts

And how Result data looks in a table: This is an example created in Grafana connected to InfluxDB - table

This is not a perfect way of getting it to work but I hope it resolves your problem.

Crashtein
  • 261
  • 2
  • 8
  • This is awesome - not sure if there is a "perfect way", but does the trick. Could you expand a bit on what the inner GROUP BY granularity is, or should be? Why is it doing a group-by "1s" in the inner query and by "1h" in the outer query? I have indeed tested locally with minor adjustments (field names and other WHERE filters to match our data), and I am getting results ranging from 0 (ok!) to... 1.0000000000000622, while values are all 0 or 1. Is the FILL() adding an extra element to each data point perhaps? – herchu Mar 14 '22 at 18:47
  • "Could you expand a bit on what the inner GROUP BY granularity is, or should be? Why is it doing a group-by "1s" in the inner query and by "1h" in the outer query?" I have used 1s as a standard approach. Subquery just gives to outer query data filled with fill(previous) values (if no values in exact "1s" period exists or none if there was no previous value). Note that If there is a data in exact second, its timestamp will be rounded to "1s" in a new dataset. That's because in real we do not just copy data to a new line, we pick "1s" periods and we pick last value from each of this period. – Crashtein Mar 14 '22 at 22:39
  • All of that is needed because integral() needs any data in each outer period (I have considered there 1h periods) for results with statistics. Of course you can use different periods but: 1) Changing period "1s" in subquery will affect with rounding timestamps up to this period, so if you have turned light on at 10:21:19.999, subquery will output like you have turned light at 10:21:10.000. It will also not recognize if beetween 10:21:10 to 10:12:19.999 were any other light changes (they will be ignored). – Crashtein Mar 14 '22 at 22:48
  • That's a thing why you should use small periods in subquery. Not too small because it will make query execution very long. Every period there will be made a new point with value. In huge periods it will take a lot of time to calculate all of that. – Crashtein Mar 14 '22 at 22:49
  • 2) Changin outer period should be adequate to your statistics period, if you want to calculate how much of time, in percentage you have "light on/off". If you change it to 2h for example then you will get statistic for "light on" within 2h periods. It is important to use outer query period as a multiplication of subquery periods. In other case results might be invalid, because at the beginning of each outer query period on the beginning will not be any value (0 or 1), just none, treated like 0 in this case. – Crashtein Mar 14 '22 at 22:56
  • "I have indeed tested locally with minor adjustments (field names and other WHERE filters to match our data), and I am getting results ranging from 0 (ok!) to... 1.0000000000000622, while values are all 0 or 1." - I think that this is because of float type precision. There is no "exactly" 1 in float. Value 1.0000000000000622 is ~10^-14 accurate... I am not 100% sure. – Crashtein Mar 14 '22 at 23:02
  • Excellent explanations; and now I know I will keep using `1s` in the subquery (and why!). I have tested other periods already in the outer query; in fact it was part of my original use case. As for the float precision -- you surely surely right; it's just IEEE-754 precision error byproduct... I don't know how I did not think of it earlier. Thanks for all the extra detail, I think all the comments about the queries are helpful for readers (might even be useful as part of the response if it ever gets edited) – herchu Mar 15 '22 at 12:13