I have a simple data set of customers (about 40,000k) It looks like:
customerid, group, other_variable
a,blue,y
b,blue,x
c,blue,z
d,green,y
e,green,d
f,green,r
g,green,e
I want to randomly select for each group, Y amounts of customers (along with their other variable(s). The catch is, i want to have two random selections of Y amounts for each group i.e.
4000 random green customers split into two sets of 2000 randomly
and 4000 random blue customers split into two sets of 2000 randomly
This is because I have different messages to give to the two different splits I'm not sampling with replacement. Needs to be unique customers
Would prefer a solution in PROC SQL but happy for alternative solution in sas if proc sql isn't idea