Flux aggregate function to calculate the difference from last to first

Question

I'm trying to learn the flux query language for InfluxDB. I'm using InfluxDB OSS 2.7.

I have a time-series with power usage from my power meter. It reports an ever increasing number in KWh, and I want to show how many Wh I have used per day, by using a custom function with aggregateWindow. Here is what I have tried:

myFunc = (tables=<-, column) => {
  a = tables
    |> first(column: column)
    |> findRecord(fn: (key) => true, idx: 0)

  b = tables
    |> last(column: column)
    |> findRecord(fn: (key) => true, idx: 0)

  d = b._value - a._value

  return tables
    |> first()
    |> map(fn: (r) => ({ r with _value: d}))
}

from(bucket: "a")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "el")
  |> filter(fn: (r) => r["_field"] == "ACTIVE_IMPORT")
  |> aggregateWindow(every: 1d, fn: myFunc, createEmpty: false)
  |> yield(name: "Wh")

But this returns a new table, where all _value have the same number (in my case 134).

I was hoping that the variables a and b would have the first and the last value of each window, and that d would represent the usage in each window - but this does not seem to be the case.

score 0 · Answer 1 · answered Apr 12 '23 at 07:13

If you want to find for each window the difference between the first and last values you can use the spread function. To be precise spread calculates the difference between the minimum and maximum values (not first and last) but in an always-increasing series the two are the same.

This though is not precise. If you consider the following data:

timestamp	value
2023-04-12T00:00:00Z	100
2023-04-12T01:00:00Z	101
2023-04-12T02:00:00Z	102
2023-04-12T03:00:00Z	103
2023-04-12T04:00:00Z	104
2023-04-12T05:00:00Z	105
2023-04-12T06:00:00Z	106
2023-04-12T07:00:00Z	107
2023-04-12T08:00:00Z	108
2023-04-12T09:00:00Z	109
2023-04-12T10:00:00Z	110
2023-04-12T11:00:00Z	111
2023-04-12T12:00:00Z	112
2023-04-12T13:00:00Z	113
2023-04-12T14:00:00Z	114
2023-04-12T15:00:00Z	115
2023-04-12T16:00:00Z	116
2023-04-12T17:00:00Z	117
2023-04-12T18:00:00Z	118
2023-04-12T19:00:00Z	119
2023-04-12T20:00:00Z	120
2023-04-12T21:00:00Z	121
2023-04-12T22:00:00Z	122
2023-04-12T23:00:00Z	123
2023-04-13T00:00:00Z	124
2023-04-13T01:00:00Z	125
2023-04-13T02:00:00Z	126
2023-04-13T03:00:00Z	127
2023-04-13T04:00:00Z	128
2023-04-13T05:00:00Z	129
2023-04-13T06:00:00Z	130
2023-04-13T07:00:00Z	131
2023-04-13T08:00:00Z	132
2023-04-13T09:00:00Z	133
2023-04-13T10:00:00Z	134
2023-04-13T11:00:00Z	135
2023-04-13T12:00:00Z	136
2023-04-13T13:00:00Z	137
2023-04-13T14:00:00Z	138
2023-04-13T15:00:00Z	139
2023-04-13T16:00:00Z	140
2023-04-13T17:00:00Z	141
2023-04-13T18:00:00Z	142
2023-04-13T19:00:00Z	143
2023-04-13T20:00:00Z	144
2023-04-13T21:00:00Z	145
2023-04-13T22:00:00Z	146
2023-04-13T23:00:00Z	147

Then if you take the spread in each day you will get:

from(bucket: "a")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "el")
  |> filter(fn: (r) => r["_field"] == "ACTIVE_IMPORT")
  |> aggregateWindow(every: 1d, fn: spread, createEmpty: false)
  |> yield(name: "Wh")

day	first	last	spread
2023-04-13T00:00:00Z	100	123	23
2023-04-13T00:00:00Z	124	147	23

The actual consumption for each day, instead is 24. By using spread you are missing one interval for each day (the consumption from 123 to 124 is never accounted). Of course if your data has higher granulatiry (e.g.: every minute or every second) the missing value will be a lot less significant.

To solve this I would suggest just getting a single value for each day (the last) and then using the difference function. This will then do a "rolling difference" so subtract to each value the previous one and will give you a better result (all hours will be accounted for).

from(bucket: "a")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "el")
  |> filter(fn: (r) => r["_field"] == "ACTIVE_IMPORT")
  |> aggregateWindow(every: 1d, fn: last, createEmpty: false)
  |> difference()
  |> yield(name: "Wh")

Aggregating with last will give:

day	last
2023-04-13T00:00:00Z	123
2023-04-13T00:00:00Z	147

Then applying difference will result in:

day	difference(last)
2023-04-13T00:00:00Z
2023-04-13T00:00:00Z	24

NOTE: the first value will be null since it does not have anything before it to di the difference with

Flux aggregate function to calculate the difference from last to first

1 Answers1