7

I'm struggling to understand about the concept of gauge in statsd. Please explain how it works and and some example when it can be useful.

The doc is not really clear to me

Set a gauge value.

stat: the name of the gauge to set. value: the current value of the gauge. rate: a sample rate, a float between 0 and 1. Will only send data this percentage of the time. The statsd server does not take the sample rate into account for gauges. Use with care. delta: whether or not to consider this a delta value or an absolute value. See the gauge type for more detail.

Jerry
  • 1,704
  • 5
  • 20
  • 40

1 Answers1

6

Gauge simply reflect the status of your system, or some metric you don't want to be aggregated at all.

Let me give you some examples.

1) In your program, you can use some language-specific API to know how many memory this process are using. Like, in Golang, we can do:

var stat runtime.MemStats
runtime.ReadMemStats(&stat)
heapAlloc := memStat.HeapAlloc
heapInuse := memStat.HeapInuse
heapObjects := memStat.HeapObjects
statsd.Gauge("machine01.memory.heap.alloc", heapAlloc)
statsd.Gauge("machine01.memory.heap.inuse", heapInuse)
statsd.Gauge("machine01.memory.heap.objects, heapObjects)

For simplicity, you can regard these metrics as memory usage when your code invoke the runtime API. So you can send to StatsD with gauges because every one of them can perfectly show you the memory usage in 10 seconds which is the default flush period in StatsD. And also, you don't need to use any aggregation method for these metrics because aggregation, like sum, doesn't make any sense.

Besides the case above, it has so many use cases for usage, like CPU usage, system load of OS, the number of threads in your process, the number of online connections in your server, the number of active transactions right now in your trading system.

2) Sometimes, we can also use gauge to track the time when something happened. Like,

res, err := something()
if err != nil {
   statsd.Gauge("machine01.something.error", time.Now().Unix())
}

So once the error happened, you can perceive that by looking at the line of your Graphite dashboard. And also, you can analyze and get the frequency of occurrences by looking at the shape of the line.

pfctgeorge
  • 698
  • 3
  • 9
  • "aggregation, like sum, doesn't make any sense" - but what about server farms? let's say you have something like a Redis cluster and you want to gauge the total available memory across the cluster? Is that kind of aggregation builtin or is it done by the dashboard app (like DataDog) ? – sergiopereira Aug 15 '22 at 18:41
  • 1
    There are two options in my first mind: 1) use one count metric for the whole cluster and every Redis node emit to this metric statsd.Count("cluster.mycluster.totalAvailableMemory", xxx); 2) use one gauge metric for each node statsd.Gauge("cluster.mycluster.node.hostName1.totalAvailableMemory", xxx). Personally, I would recommend the option 2 as you wouldn't lose the visibility of each node and like you said you could easily use dashboard app to aggregate all metrics of the cluster in a graph. – pfctgeorge Aug 17 '22 at 00:51
  • pfctgeorge thanks for the suggestions. I'm having to implement something like this and I was leaning towards option 2 myself. I was just not sure if I'd run into problems trying to aggregate in the dashboard itself. – sergiopereira Aug 20 '22 at 15:05
  • That would be an issue if you have so many nodes (like over 1k) where the dashboard need to pull up thousands of metrics and then doing aggregation so that might spend you tens of seconds in loading the dashboard. In that case you could use the combination of option 1 and 2 where you add the metrics from option 1 into the dashboard and investigate individual metrics whenever you need. – pfctgeorge Aug 23 '22 at 05:03