0

I have a dataset that looks similar to this

    USER_ID VARIANT_NAME    REVENUE
0   737          variant    1.5
1   2423         control    0.0
2   9411         control    0.0
3   7311         control    2.3
4   6174         variant    0.0

Now I have use the following to do a sampling distribution:

#Number of users in control group
n_con =  df_control.shape[0]

#Number of users in variant group
n_var = df_variant.shape[0]

#Probability of generating revenue in control group
p_con = df_con_n0.shape[0] / df_control.shape[0]

#Probability of generating revenue in variant group
p_var = df_var_n0.shape[0] / df_variant.shape[0]

p_diffs = []
for i in range(10000):
    var_converted = np.random.choice([1,0],size=n_var,p=(p_var,(1-p_var)))
    var_unique , var_count = np.unique(var_converted,return_counts=True)
    var_ncon , var_con = np.split(var_count,2)
    con_converted = np.random.choice([1,0],size=n_con,p=(p_con,(1-p_con)))
    con_unique , con_count = np.unique(con_converted,return_counts=True)
    con_ncon , con_con = np.split(con_count,2)
    var_con_p = int(var_con)/ int(var_con+var_ncon)
    con_con_p = int(con_con)/ int(con_con+con_ncon)
    p_diffs.append(var_con_p - con_con_p)
p_diffs

This works great when creating a sampling distribution based on if the user generated revenue or not. However, I would like to do one based on the average revenue, create a sampling distribution based on avg revenue instead if more users in a group generated more revenue or not.

Cheers!

0 Answers0