Combining two different sources with different timestamps in influxdb/flux

Question

I have 2 measurements as follows:

metrics,app=app1 cpu=10 1654150510
metrics,app=app1 cpu=12 1654150512
metrics,app=app1 cpu=13 1654150514
metrics,app=app1 cpu=14 1654150516
metrics,app=app1 cpu=15 1654150519

The frequency of the "metrics" measurement is about once in 2/3 seconds. And the second one is:

http_requests,app=app1 response_time=12 1654150509
http_requests,app=app1 response_time=11 1654150510
http_requests,app=app1 response_time=15 1654150511
http_requests,app=app1 response_time=14 1654150512
http_requests,app=app1 response_time=13 1654150513
http_requests,app=app1 response_time=10 1654150514
http_requests,app=app1 response_time=12 1654150515
http_requests,app=app1 response_time=11 1654150516
http_requests,app=app1 response_time=13 1654150517
http_requests,app=app1 response_time=12 1654150518

The frequency for http_requests is about 1 second.

I want to combine the 2 metrics into a single table.

_time,value_cpu,value_response_time
1654150509,10,12
1654150510,10,11
1654150511,12,15

As timestamps may be different, is there a way to combine them in flux? Is fill the way. I'm not sure if timeshift will help here. Although I didnt understand it completly. I assume some sort of downsampling is needed (not sure how to do that either in flux). Is there a way to mathch the measuerment based on the closest time differece? IE... if response measurements came at time instances

1654150510,app=app1 response_time=10
1654150513,app=app1 response_time=12
1654150514,app=app1 response_time=11
1654150516,app=app1 response_time=13

and CPU came in at

1654150512,app=app1 cpu=20
1654150515,app=app1 cpu=30

Then resulting table is

_time,response_time,cpu
1654150510,10,
1654150513,12,20
1654150514,11,
1654150516,13,30

The CPU value combines to the point with the closest timestamp (+/- difference) How can this be achieved in flux in influxdb?

alespour · Answer 1 · 2022-06-02T14:28:42.803

I guess downsampling with aggregateWindow and fill could work.

Alternative way is to pivot and then fill missing values using previous value. The advantage, at least from performance point of view, is that when there is no record in neither measurement at given time, no new row filled with previous values is created.

With

from(bucket: "stackx")
 |> range(start: -1d)
 |> filter(fn: (r) => r._measurement == "metrics" or r._measurement == "http_requests")
 |> drop(columns: ["_measurement"]) // or remove from group other way
 |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
 |> sort(columns: ["_time"], desc: false)
 |> fill(column: "cpu", usePrevious: true)
 |> fill(column: "response_time", usePrevious: true)

the result would be

_time,app,cpu,response_time
2022-06-02T06:15:09Z,app1,,12
2022-06-02T06:15:10Z,app1,10,11
2022-06-02T06:15:11Z,app1,10,15
2022-06-02T06:15:12Z,app1,12,14
2022-06-02T06:15:13Z,app1,12,13
2022-06-02T06:15:14Z,app1,13,10
2022-06-02T06:15:15Z,app1,13,12
2022-06-02T06:15:16Z,app1,14,11
2022-06-02T06:15:17Z,app1,14,13
2022-06-02T06:15:18Z,app1,14,12
2022-06-02T06:15:19Z,app1,15,12

Combining two different sources with different timestamps in influxdb/flux

1 Answers1