-1

Basically, I need to create a table that will show the avg users per day of the week and if there is a 20% positive or negative deviation in the numbers from the average I need it to say 'Positive outlier' or 'negative outlier'. I have currently calculated the avg users but I do not know how to write the case statement which will show what I need to derive from the data. This is what I have so far:

SELECT
  EXTRACT(DAYOFWEEK FROM date) AS dayOfWeek,
  AVG(users) AS average_users,
FROM
(
  SELECT
    PARSE_DATE('%Y%m%d', date) AS date,
    COUNT(DISTINCT(fullVisitorId)) AS users
  FROM `table`
  WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE ('%Y%m%d', DATE_SUB(CURRENT_DATE, INTERVAL 14 DAY))
    AND FORMAT_DATE ('%Y%m%d', DATE_SUB(CURRENT_DATE, INTERVAL 1 DAY))
  GROUP BY date
)
GROUP BY dayOfWeek
ORDER BY dayOfWeek ASC
jarlh
  • 42,561
  • 8
  • 45
  • 63
  • `DISTINCT` is not a function, it's a _set quantifier_. Simply write `COUNT(DISTINCT fullVisitorId) AS users` to make code clearer. – jarlh Oct 20 '20 at 08:45
  • Make it easy to assist you: show us some sample table data and the expected result - all as formatted text (no images). Simplify if possible, [mcve]. – jarlh Oct 20 '20 at 08:46
  • What does 20% positive or negative deviation mean? Sample data and a *clear* explanation would help. – Gordon Linoff Oct 20 '20 at 11:45

1 Answers1

0

put this code in CTE eg,

with cte as (SELECT
  EXTRACT(DAYOFWEEK FROM date) AS dayOfWeek,
  AVG(users) AS average_users,
FROM
(
  SELECT
    PARSE_DATE('%Y%m%d', date) AS date,
    COUNT(DISTINCT(fullVisitorId)) AS users
  FROM `table`
  WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE ('%Y%m%d', DATE_SUB(CURRENT_DATE, INTERVAL 14 DAY))
    AND FORMAT_DATE ('%Y%m%d', DATE_SUB(CURRENT_DATE, INTERVAL 1 DAY))
  GROUP BY date
)
GROUP BY dayOfWeek
ORDER BY dayOfWeek ASC)

select dayOfWeek,average_users,
case when average_users > 20 then "positive'
else 'negative' end as outlier
 from cte;
Somy
  • 1,474
  • 1
  • 4
  • 13