Creating a column in a dataframe based on multiple conditions based on values from different columns.
The goal is to get an indication of when the first interesting action was of a customer, this will be represented with a 1 under t0.
Dataframe is structured as below:
cust_id first_act prod_1 prod_2 t0
0 1 1 1
22 2
23 2 1
24 2 1
25 2
26 3 1
27 3
28 3
29 4
30 4
I want to assign a value to the column t0 based on the following conditions:
if customer has a 1 under prod_1: assign value 1 to t0 at the index where it has the 1 under prod_1.
if customer does not have a 1 under prod_1, check if customer has a 1 under prod_2 and if true assign t0 the value of 1 at the index where the condition is true.
lastly: if the customer does not have prod_1 or prod_2 but does have a 1 under first_act, assign the value 1 to the index where first act is true, under t0.
After these conditions there should only be one value in t0 for each customer.
Expected output for cust_id 2:
cust_id first_act prod_1 prod_2 t0
0 1 1 1
22 2 1
23 2 1
24 2 1 1
25 2
26 3 1
27 3
28 3
29 4
30 4
I tried doing this with nested np.where statements but that did not work as the following:
df['t0'] = np.where(df['prod_1'] == 1, 1 ,
np.where(df['prod_2'] == 1, 1,
np.where(df['first_act'] == 1, 1, 0)))
Adds 1's to t0 on multiple locations.
Update
@Jeffyx I don't know if this clears it up a bit but what I thought of was this:
if prod_1 == 1:
t0 = 1 at index of prod_1 == 1
if not prod_1 == 1:
if prod_2 == 1:
t0 = 1 at index of prod_2 == 1
if not prod_1 == 1 and not prod_2 == 1:
if first_act == 1:
t0 = 1 at index of first_act == 1