Ok, so lets make some fake data, and do the count thing:
WITH fake_data(client_id, visit_date) as (
SELECT * FROM VALUES
-- this person has visted once
(1, '2022-04-14'::date),
-- this person has visited 3 timw in the year
(3, '2022-04-13'::date),
(3, '2022-03-13'::date),
(3, '2022-02-13'::date),
-- this person is a huge vistor, but 1 is outside the with in last year.
(5, '2022-04-12'::date),
(5, '2022-03-12'::date),
(5, '2022-02-12'::date),
(5, '2022-01-12'::date),
(5, '2020-02-12'::date)
)
SELECT *,
count(distinct visit_date) over (partition by client_id) as total_visits
FROM fake_data
WHERE visit_date >= dateadd('year', -1, '2022-04-14' /* CURRENT_DATE */)
boom:
CLIENT_ID |
VISIT_DATE |
TOTAL_VISITS |
1 |
2022-04-14 |
1 |
3 |
2022-04-13 |
3 |
3 |
2022-03-13 |
3 |
3 |
2022-02-13 |
3 |
5 |
2022-04-12 |
4 |
5 |
2022-03-12 |
4 |
5 |
2022-02-12 |
4 |
Now to make those into those thee group/categories.
SELECT *,
count(distinct visit_date) over (partition by client_id) as total_visits,
case
when total_visits = 1 then 1
when total_visits <= 3 then 2
when total_visits > 3 then 3
end as group_id
FROM fake_data
WHERE visit_date >= dateadd('year', -1, '2022-04-14' /* CURRENT_DATE */)
Now some math, of which I will wrap that into a sub-select (but also push a couple things down into it)
WITH fake_data(client_id, visit_date) as (
SELECT * FROM VALUES
-- this person has visted once
(1, '2022-04-14'::date),
-- this person has visited 3 timw in the year
(3, '2022-04-13'::date),
(3, '2022-04-11'::date),
(3, '2022-04-09'::date),
-- this person is a huge vistor, but 1 is outside the with in last year.
(5, '2022-04-12'::date),
(5, '2022-03-12'::date),
(5, '2022-02-12'::date),
(5, '2022-01-12'::date),
(5, '2020-02-12'::date)
)
SELECT group_id
,count(distinct client_id) as count_of_group_members
,sum(total_visits) as sum_of_group_visit
,avg(visit_gap_in_days) as avg_group_day_diff
,stddev(visit_gap_in_days) as stddev_group_day_diff
FROM (
SELECT *,
count(distinct visit_date) over (partition by client_id) as total_visits,
case
when total_visits = 1 then 1
when total_visits <= 3 then 2
when total_visits > 3 then 3
end as group_id,
lag(visit_date) over (partition by client_id order by visit_date) as prior_visit_date,
datediff('day', prior_visit_date, visit_date) as visit_gap_in_days
FROM fake_data
WHERE visit_date >= dateadd('year', -1, '2022-04-14' /* CURRENT_DATE */)
)
GROUP BY 1
ORDER BY 1
GROUP_ID |
COUNT_OF_GROUP_MEMBERS |
SUM_OF_GROUP_VISIT |
AVG_GROUP_DAY_DIFF |
STDDEV_GROUP_DAY_DIFF |
1 |
1 |
1 |
|
|
2 |
1 |
9 |
2 |
0 |
3 |
1 |
16 |
30 |
1.732050808 |
Wozers, that sum of visits is wrong, I have summed my sums..
So here given the count(distinct visitno)
I cannot sum that, as it becomes the sum of sums, AND I cannot do a count(*) because we have just noticed there are duplicates (otherwise the distinct is not needed). And I assume you have not stripped the rows down, as there is some "other details that you will want"
But anyways. This is the great things about SQL, you can answer anything, but you have to know the Question, and know the Data so you can know which assumptions can be held true for your data.