0

I need to compare the error rate of two different HTTP client versions with that of a legacy client, for each possibile combination of status code and client version.

As an example of what I'm trying to achieve, consider that I have the following results for the legacy client:

legacy HTTP response status % of responses with that status
401 0.02
503 0.01

and the following results for v1:

v1 HTTP response status % of responses with that status
401 0.01
503 0.04

then I want a query that would give me the following results:

status v1 / legacy error rate ratio
401 0.5
503 4

So, I want the query to show me that the error rate for v1 when you consider the 503 status code is 4x what I have for legacy.

I have a metric http_response that has the following labels:

  • status: the HTTP response status code
  • version: the HTTP client version (whose values can be legacy, v1 or v2)

I came up with the following query, but I get a "Empty query result" response when I try to run it through Thanos:

sum(rate(http_response{status="[45]..", version=~"v2|v1"}[15m])) by (version, status) /  on(version) 
group_left
sum(rate(http_response{version=~"v2|v1"}[15m])) by (version) / on (status) 

group_left

sum(rate(http_response{status=~"[45]..", version="legacy"}[15m])) by (version, status) /  on(version) 
group_left
sum(rate(http_response{version="legacy"}[15m])) by (version)

Am I missing something?

Thanks in advance!

EDIT:

Each query that calculates the error rate for each client individually does return, for example:

sum(rate(http_response{status="[45]..", version=~"v2|v1"}[15m])) by (version, status) /  on(version) 
group_left
sum(rate(http_response{version=~"v2|v1"}[15m])) by (version) / on (status) 

gives me:

result value
{status="503", version="v1"} 0.018
{status="503", version="v2"} 0.033
{status="400", version="v1"} 0.002

while the one that determines the ratio for the legacy client gives me:

result value
{status="503", version="legacy"} 0.016
{status="400", version="legacy"} 0
{status="500", version="legacy"} 0.001

But I can't manage to make the original query work. I wonder if this 0 in the 400 status code line would be a problem.

c0tonet
  • 23
  • 6
  • Have you tried debugging? On which step do you begin loosing values? – markalex Aug 06 '23 at 18:22
  • Also, are you sure `rate` is what you wanted to use? Shouldn't it maybe be `increase? 15m seems a bit unusual for rate. – markalex Aug 06 '23 at 18:28
  • hi @markalex I've broken down the original query and edited the question with the individual results. I'm not able to get results only when I combine both queries. As for you suggestion switching to `increase`, I think it's not an option for me as I'm really interested in comparing the per-second rate of errors. – c0tonet Aug 06 '23 at 21:30
  • Presumably labels in the last table have a typo and all versions there are `legacy`? – markalex Aug 06 '23 at 21:33
  • You're right, fixed it. thanks – c0tonet Aug 06 '23 at 22:06

0 Answers0