I have an API that that processes collections. The execution time of this API is related to the collection size (the larger the collection, the more it will take).
I am researching how can I do this with prometheus but am unsure whether I am doing things correctly (documentation is a bit lacking in this area).
the first thing I did is define a Summary metric to measure execution time of the API. I am using the canonical rate(sum)/rate(count) as explained here.
Now, since I know that the latency may be affected by the size of the input, I also want to overlay the request size on the avg execution time. Since I dont want to measure each possible size, I figured I'd use a histogram. Like so:
Histogram histogram = Histogram.build().buckets(10, 30, 50)
.name("BULK_REQUEST_SIZE")
.help("histogram of bulk sizes to correlate with duration")
.labelNames("method", "entity")
.register();
Note: the term 'size' does not relate to the size in bytes but to the length of the collection that needs to be processed. 2 items, 5 items, 50 items...
and in the execution I do (simplified):
@PUT
void process(Collection<Entity> entitiesToProcess, string entityName){
Timer t = summary.labels("PUT_BULK", entityName).startTimer()
// process...
t.observeDuration();
histogram.labels("PUT_BULK", entityName).observe(entitiesToProcess.size())
}
Question:
- Later when I am looking at the BULK_REQUEST_SIZE_bucket in Grafana, I see that all buckets have the same value, so clearly I am doing something wrong.
- Is there a more canonical way to do it?