1

I'm learning PostgreSQL and I have a database log that has these columns: path text, ip inet, method text, status text, time timestamptz, id int

Using this database my goal is to find any days that have a error percent (status != '200 OK' / status) of over 2.5 percent. I have the query, seen below, that gets to the error percent as a decimal but I can't figure out how to multiply them by 10 to make them a 'percent' nor how to check if they are over 2.5.

I have tried both HAVING and WHERE in multiple spots but I either get a Programming error at group by or a syntax error wherever I enter the HAVING or WHERE. What's the correct way to change it to a percent and check if its over 2.5? Thanks in advance

Mock data:

CREATE TABLE log (
    path text,
    ip inet,
    method text,
    status text,
    "time" timestamp with time zone DEFAULT now(),
    id integer NOT NULL
);

INSERT INTO log VALUES ('/article/bears-love-berries', '198.51.100.76' , 'GET', '200 OK', '2016-07-1 12:54:22+00', 3355597);
INSERT INTO log VALUES ('/article/candidate-is-jerk', '198.51.100.76' , 'GET', '200 OK', '2016-07-1 19:54:22+00', 3355598);
INSERT INTO log VALUES ('/article/bears-love-berries', '198.51.100.76' , 'GET', '404 NOT FOUND', '2016-07-1 22:54:22+00', 3355599);
INSERT INTO log VALUES ('/article/goats-eat-googles', '203.0.113.42' , 'GET', '200 OK', '2016-07-11 12:54:22+00', 3355600);
INSERT INTO log VALUES ('/article/candidate-is-jerk', '203.0.113.42' , 'GET', '200 OK', '2016-07-11 17:54:22+00', 3355601);
INSERT INTO log VALUES ('/article/bears-love-berries', '203.0.113.42' , 'GET', '200 OK', '2016-07-11 19:54:22+00', 3355602);
INSERT INTO log VALUES ('/article/goats-eat-googles', '203.0.113.42' , 'GET', '404 NOT FOUND', '2016-07-15 13:54:22+00', 3355603);
INSERT INTO log VALUES ('/article/bears-love-berries', '198.51.100.34' , 'GET', '200 OK', '2016-07-15 20:54:22+00', 3355604);
INSERT INTO log VALUES ('/article/candidate-is-jerk', '198.51.100.34' , 'GET', '200 OK', '2016-07-15 23:54:22+00', 3355605);
INSERT INTO log VALUES ('/article/candidate-is-jerk', '203.0.113.12' , 'GET', '200 OK', '2016-07-21 19:54:22+00', 3355606);
INSERT INTO log VALUES ('/article/bears-love-berries', '203.0.113.12' , 'GET', '200 OK', '2016-07-21 12:54:22+00', 3355607);
INSERT INTO log VALUES ('/article/goats-eat-googles', '198.51.100.76' , 'GET', '200 OK', '2016-07-21 01:54:22+00', 3355608);
INSERT INTO log VALUES ('/article/goats-eat-googles', '198.51.100.76' , 'GET', '404 NOT FOUND', '2016-07-29 07:54:22+00', 3355609);
INSERT INTO log VALUES ('/article/candidate-is-jerk', '198.51.100.76' , 'GET', '404 NOT FOUND', '2016-07-29 20:54:22+00', 3355610);
INSERT INTO log VALUES ('/article/goats-eat-googles', '198.51.100.34' , 'GET', '200 OK', '2016-07-29 19:54:22+00', 3355611);

Query:

SELECT DATE(time) AS day,
(
  (SELECT CAST(COUNT(status) AS FLOAT)
  FROM log WHERE status != '200 OK') /
  (SELECT CAST(COUNT(status) AS FLOAT))
) AS views
FROM log GROUP BY day
ORDER BY views
Will Scott
  • 77
  • 3
  • 11
  • Decompose the problem into smaller problems. What is a percent? It's a ratio of two numbers. What are the two numbers? You need a count status = 200 and a count of status != 200. I think if you can count the errors per day, you'll find the rest to be easy. – Brandon May 25 '17 at 22:52
  • @Brandon I can find the count of the status != 200 and the count of the overall requests each day. The question is how do I test if the ratio of the two is greater than 2.5 percent, to put a WHERE or HAVING on the outer SELECT it has a syntax error or a column views not found error – Will Scott May 25 '17 at 23:26

1 Answers1

1

After sharing a test case in sqlfiddle, we could get down to a running code that would do it.

The fiddle is here: http://sqlfiddle.com/#!15/b3ed9/39

Instead of multiplicating the result by 100 you can just check against the already-multiplied 0.025, no? I mean, if you have values between 0-1, the equivalent to 2.5% would be just 0.025.

Thus that command should do:

select views.day, views.cnt from
 (select datelog.day,
  (
   cast(count(case when datelog.status != '200 OK' then 1 else null end) as float)
   /
   cast(count(*) as float)
  ) as cnt
  from (select DATE(time) as day, status from log) as datelog
  group by datelog.day
 ) as views
 where views.cnt > 0.025

The last revision was deeply founded in the reply here: SQL use column from subselect in where clause

Avenger
  • 177
  • 1
  • 8
  • I tried both this and HAVING but I get ProgrammingError: Column 'views' does not exist? Also the reason I'm trying to get it to be 2.5 and not 0.025 is I need to present it to python to display it without having to update ie times by 100 in python. – Will Scott May 25 '17 at 23:00
  • And when I do HAVING instead of where I get a syntax error on GROUP BY not the column error – Will Scott May 25 '17 at 23:09
  • Don't you think you missed a parens on your two subselect? I mean, I read / () AS views -- so instead ( / ) AS views shouldn't do it? – Avenger May 25 '17 at 23:12
  • Woops I accidently mistyped the post, but no it is still giving a views does not exist error – Will Scott May 25 '17 at 23:17
  • 1
    You are also missing 'FROM log' on your second sub-query. :) – Avenger May 25 '17 at 23:24
  • Yes I was missing that, thank you, but the same error is still persisting, any idea why? – Will Scott May 25 '17 at 23:28
  • alright, the answer is here: https://stackoverflow.com/questions/18029786/sql-use-column-from-subselect-in-where-clause I will edit my answer from that one, hope you'll understand what's the point. Basically wrap your whole query as if it were a table. Then the where goes into action. – Avenger May 25 '17 at 23:38
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/145161/discussion-between-will-scott-and-avenger). – Will Scott May 25 '17 at 23:46