1

in my setup I have a java component reading data from YARN manager and exposing results of various jobs as metrics. For example I have a metrics with job duration which just holds duration of last app run. It may look like this:

duration_time_millis{job="probe",app_name="import-results",app_type="MAPREDUCE",status="SUCCEEDED"}
1991392 @1542770979.823
1991392 @1542770994.823
1991392 @1542771009.823
...
265722 @1542781554.823
265722 @1542781569.823
265722 @1542781584.823
...

The thing is I am scraping the expose server every 15s or so, but the jobs runs irregulary once per several hours. That means over past 6 hours I am getting 563x the first value and 520x the second value. As there is only one change in the interval.

Is there a way how to compute avg or stddev only on distinct values? Getting the number of distinct values would also mean better handling in histograms and heatmaps in grafana where count_values does not seem to be a good solution.

Thanks for any help on this!

Milano Nicolum
  • 131
  • 1
  • 6
  • 1
    You seem to be on the right track with `count_values`. To get the current number of distinct values for a metric you could use something like `count(count_values("hi there stack overflow", up))`. I don't think there is currently any Promql function that would do anything like `count_values_over_time` so there is not a way that I am aware of to be able to calculate `avg` or `avg_over_time` based on unique values. Sorry to break it to ya :( – wbh1 Nov 21 '18 at 15:41
  • What a pity. If I check only one time series `count_values` always returns `1` as there is only one value at a time. And since there is no such function working with range vector, I cannot get much useful data for selected interval. Though I am a bit surprised there is no workaround at least for such simple query. – Milano Nicolum Nov 22 '18 at 08:51

0 Answers0