0

Using the provided table I would like to sample let's say 2 users per day so that users assigned to the two days are different. Of course the problem I have is more sophisticated, but this simple example gives the idea.

drop table if exists test; 

create table test (
user_id int,
day_of_week int);

insert into test values (1, 1);
insert into test values (1, 2);
insert into test values (2, 1);
insert into test values (2, 2);
insert into test values (3, 1);
insert into test values (3, 2);
insert into test values (4, 1);
insert into test values (4, 2);
insert into test values (5, 1);
insert into test values (5, 2);
insert into test values (6, 1);
insert into test values (6, 2);

The expected results would look like this:

create table results (
user_id int,
day_of_week int);

insert into results values (1, 1);
insert into results values (2, 1);
insert into results values (3, 2);
insert into results values (6, 2);
abu
  • 737
  • 5
  • 8
  • 19

1 Answers1

1

You can use window functions. Here is an example . . . although the details do depend on your database (functions for random numbers vary by database):

select t.*
from (select t.*, row_number() over (partition by day_of_week order by random()) as seqnum
      from test t
     ) t 
where seqnum <= 2;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thanks, this works :) One more question, do you know how to make it work if I wanted to make the number of observations to sample vary by day? I will edit the question accordingly. – abu Feb 07 '18 at 13:00
  • @abu . . . Do not edit the question. Ask another question! This one is answered. – Gordon Linoff Feb 07 '18 at 13:04
  • The question is [here](https://stackoverflow.com/questions/48665786/sampling-without-replacement-with-a-different-sample-size-per-group-in-sql). – abu Feb 07 '18 at 14:03