Take the median of a grouped set

Question

I am quite new to Flux and want to solve an issue:

I got a bucket containing measurements, which are generated by a worker-service.

Each measurement belongs to a site and has an identifier (uuid). Each measurement contains three measurement points containing a value.

What I want to archive now is the following: Create a graph/list/table of measurements for a specific site and aggregate the median value of each of the three measurement points per measurement.

TLDR;

Get all measurementpoints that belong to the specific site-uuid
As each measurement has an uuid and contains three measurement points, group by measurement and take the median for each measurement
Return a result that only contains the median value for each measurement

This does not work:

from(bucket: "test")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "lighthouse")
  |> filter(fn: (r) => r["_field"] == "speedindex")
  |> filter(fn: (r) => r["site"] == "1d1a13a3-bb07-3447-a3b7-d8ffcae74045")
  |> group(columns: ["measurement"])
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
  |> yield(name: "mean")

This does not throw an error, but it of course does not take the median of the specific groups.

This is the result (simple table):

score 1 · Accepted Answer · answered Nov 18 '22 at 12:38

If I understand your question correctly you want a single number to be returned. In that case you'll want to use the |> mean() function:

from(bucket: "test")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "lighthouse")
  |> filter(fn: (r) => r["_field"] == "speedindex")
  |> filter(fn: (r) => r["site"] == "1d1a13a3-bb07-3447-a3b7-d8ffcae74045")
  |> group(columns: ["measurement"])
  |> mean()
  |> yield(name: "mean")

The aggregateWindow function aggregates your values over (multiple) windows of time. The script you posted computes the mean over each v.windowPeriod (in this case 20 minutes).

I am not entirely sure what v.windowPeriod represents, but I usually use time literals for all times (including start and stop), I find it easier to understand how the query relates to the result that way.

On a side note: the yield function only renames your result and allows you to have multiple returning queries, it does not compute anything.

Thank you, that was very helpful. I saw that using `mean()` removes the `_time` from the results. Is there any way to prevent that? — SPQRInc, Nov 18 '22 at 13:55
@SPQRInc yes, but what would you expect the time value to be? After all your taking an average over a group of values, you no longer have any specific value, hence no timestamp identifying that value. You could take the median, I believe it preserves the time, but I’m unsure. Or you could set it to some value, I.e. start, stop the value in between, or maybe the average over all time values. You would probably use the map function to do so — tomsCodingCode, Nov 19 '22 at 17:35

Take the median of a grouped set

1 Answers1

Linked