Using the provided table I would like to randomly sample users per day. The number of users to be sampled is specified in the to_sample
column and it is filled by another query. In this example I would like to sample 1 observation for day one and 2 observations for day two (but this will change with every execution of the query, so don't set your mind to these numbers). I would like the users assigned to different days to be different (no overlapping assignment).
drop table if exists test;
create table test (
user_id int,
day_of_week int,
to_sample int);
insert into test values (1, 1, 1);
insert into test values (1, 2, 2);
insert into test values (2, 1, 1);
insert into test values (2, 2, 2);
insert into test values (3, 1, 1);
insert into test values (3, 2, 2);
insert into test values (4, 1, 1);
insert into test values (4, 2, 2);
insert into test values (5, 1, 1);
insert into test values (5, 2, 2);
insert into test values (6, 1, 1);
insert into test values (6, 2, 2);
The expected results would look like this:
create table results (
user_id int,
day_of_week int);
insert into results values (1, 1);
insert into results values (3, 2);
insert into results values (6, 2);
As I said, the number of users to be sampled will be different every time, as should be taken from the to_sample
column in the test table. Also I will run it for 7 days, here there are 2 to keep the example simple.
EDIT:
with day_1 as(
select t.user_id, t.day_of_week
from (select t.*, row_number() over (partition by day_of_week order by randomint(100)) as seqnum
fromtest t where t.day_of_week = 1
) t
where t.seqnum <= (select distinct to_sample fromtest where day_of_week = 1)
)
, day_2 as(
select t.user_id, t.day_of_week
from (select t.*, row_number() over (partition by day_of_week order by randomint(100)) as seqnum
from test t where t.user_id not in (select distinct user_id from day_1) and t.day_of_week = 2
) t
where t.seqnum <= (select distinct to_sample from test where day_of_week = 2)
)
select * from day_1 union all select * from day_2
I tried creating a brute solution based on some of the answers, but still there are some repeated user, even though I remove the user_id
that is already used in day_1 from day_2.
user_id | day_of_week
---------+-------------
4 | 1
4 | 2
1 | 2