I need to compare the error rate of two different HTTP client versions with that of a legacy client, for each possibile combination of status code and client version.
As an example of what I'm trying to achieve, consider that I have the following results for the legacy client:
legacy HTTP response status | % of responses with that status |
---|---|
401 | 0.02 |
503 | 0.01 |
and the following results for v1:
v1 HTTP response status | % of responses with that status |
---|---|
401 | 0.01 |
503 | 0.04 |
then I want a query that would give me the following results:
status | v1 / legacy error rate ratio |
---|---|
401 | 0.5 |
503 | 4 |
So, I want the query to show me that the error rate for v1 when you consider the 503 status code is 4x what I have for legacy.
I have a metric http_response
that has the following labels:
status
: the HTTP response status codeversion
: the HTTP client version (whose values can belegacy
,v1
orv2
)
I came up with the following query, but I get a "Empty query result" response when I try to run it through Thanos:
sum(rate(http_response{status="[45]..", version=~"v2|v1"}[15m])) by (version, status) / on(version)
group_left
sum(rate(http_response{version=~"v2|v1"}[15m])) by (version) / on (status)
group_left
sum(rate(http_response{status=~"[45]..", version="legacy"}[15m])) by (version, status) / on(version)
group_left
sum(rate(http_response{version="legacy"}[15m])) by (version)
Am I missing something?
Thanks in advance!
EDIT:
Each query that calculates the error rate for each client individually does return, for example:
sum(rate(http_response{status="[45]..", version=~"v2|v1"}[15m])) by (version, status) / on(version)
group_left
sum(rate(http_response{version=~"v2|v1"}[15m])) by (version) / on (status)
gives me:
result | value |
---|---|
{status="503", version="v1"} |
0.018 |
{status="503", version="v2"} |
0.033 |
{status="400", version="v1"} |
0.002 |
while the one that determines the ratio for the legacy
client gives me:
result | value |
---|---|
{status="503", version="legacy"} |
0.016 |
{status="400", version="legacy"} |
0 |
{status="500", version="legacy"} |
0.001 |
But I can't manage to make the original query work. I wonder if this 0 in the 400 status code line would be a problem.