Why is a counter a double and not a long?

Question

What is the motivation for a Micrometer counter to be a double, not a long? Is there a reason a user would count partial changes? How does it continue to measure increments after the count is over the size of the mantissa (~2 billion I believe)?

Update: It was pointed out that the double has a precision for 9 quadrillion, not 2 billion. That makes for plenty more space.

Java (and many other things using IEEE or -ish) `double` provides effective 53 bits precision (except for denormals, not relevant here) which is about 9 quadrillion. If it counts 1 per nanosecond, it will lose counts (or at least become inexact) after 3 months. — dave_thompson_085, Sep 23 '19 at 20:25
Ok, this was some of my concern. We may be counting 1million things a second for ~8hrs a day. I was concerned that if the system stays up for too long, that could be an issue when aggregated. However some napkin math shows "long" to be 200-ish years. — Adam, Sep 25 '19 at 19:32
@Adam if you are curious, I actually opened a ticket (that may very well be closed). https://github.com/micrometer-metrics/micrometer/issues/1925. I hope it is at least considered though it would be a HUGE change to get to that eventually. — Dean Hiller, Mar 21 '20 at 12:59

score 5 · Accepted Answer · answered Sep 25 '19 at 02:31

There are several reasons that counters are doubles instead of longs. As with any architecture it balances the tradeoffs.

Maximum compatibility with underlying metrics libraries

Micrometer is a facade over other metrics frameworks and double is the most versatile option.

Prometheus for example uses a double for its counters and by sticking with the same type, it maximizes compatibility.

As a further reason for why Prometheus uses doubles is to treat all metrics the same internally, simplifying its architecture and optimizing its performance and memory utilization. Though that a different topic for discussion.

Measuring partial changes

While most counter uses are incrementing single events. Counters are still a viable choice when measuring other things like bytes received, rows processed, etc. While none of these example are partial counts. It doesn't stop a user from coming up with a usecase where they want to measure partial counts. One example might be 'seconds spent processing', since partial seconds is very common with computing. Some systems measure whole units like milliseconds or nanoseconds instead, let's cover that next.

Maximum precision

The double type gives more precision. While this might sound counter-intuitive, bear with me. Some durations that could be measured in a counter would be total time spent performing some task. Garbage collection, DB processing, etc. Some of those events take nanoseconds. While I wouldn't recommend Micrometer as a replacement for a profiler, measuring small time units would cause confusion (see Who Wants Seconds where a Prometheus engineer explains this reasoning further)

If a registry were to standardize on nanoseconds for all measurements, those orders of magnitude that a long (9 quintillion in a long versus 9 quadrillion in a double) would use up the extra precision of the long anyways.

You'll note that Micrometer uses doubles to measure total duration in timers and the beauty of that metric is that is is actually a counter in its core (a monotonically incrementing number).

A double is plenty large

I recall doing some rough math and if a counter is measuring single events, that means it could be measuring 10,000 events every second for over 20,000 years and not begin to lose precision.

So there certainly are limitations that are good to be aware of, but given the expectations of needs of the system double is quite sufficient.

Why is a counter a double and not a long?

1 Answers1

Maximum compatibility with underlying metrics libraries

Measuring partial changes

Maximum precision

A double is plenty large