How to calculate percentage value for two binnary columns and one nominal column in DataFrame ine Python Pandas?

Question

I have DataFrame in Python Pandas like below:

data type:

ID - int64
X1 - int64
X2 - int64
CH - int64

ID X1 X2 CP CH

111 1 0 10-20 1

222 1 0 10-20 1

333 0 1 30-40 0

444 1 1 30-40 1

555 0 1 30-40 1

And I need to create new column "COL1" answered on question:

What percentage of customers had CH = '1' per combination: CP x X1 ='1' and CP x X2 = '1'

So as a result I need something like below:

col_X	col_CP	CH_perc
X1	10-20	`1.00 <- 2 IDs had X1 = '1' and CP = '10-20' and 1 of them had CH = '1', so 2/2 = 1.00`
X1	20-30	`0 <- non of IDs had X1 = '1' and CP = '20-30'`
X1	30-40	`1.00 <- 1 ID had X1 = '1' and CP = '30-40' and 1 of them had CH = '1', so 1/1 = 1.00`
X1	40-50	`0 <- non of IDs had X1 = '1' and CP = '40-50'`
X2	10-20	`0 <- non of IDs had X2 = '1' and CP = '10-20'`
X2	20-30	`0 <- non of IDs had X2 = '1' and CP = '20-30'`
X2	30-40	`0.66** <- 3 IDs had X1 = '1' and CP = '30-40' and 2 of them had CH = '1', so 2/3 = 0.66`
X2	40-50	`0 <- non of IDs had X2 = '1' and CP = '40-50'`

How can I do that in Python Pandas ?

no client has such a compartment in the cp variable, but it should be in the output :) — unbik, Oct 17 '22 at 17:26
Do you know in advance the list of all possible CP values because I don;t see the values 20-30 and 40-50 in your sample dataframe — Shubham Sharma, Oct 17 '22 at 17:29
All possible values are: 10-20, 20-30, 30-40, 40-50, but in could be possible that no client has some cp :) — unbik, Oct 17 '22 at 17:37

score 0 · Answer 1 · answered Oct 17 '22 at 17:43

Code

cps = ['10-20', '20-30', '30-40', '40-50']

x = df.filter(like='X')
x = x.where(x == 1).mul(df['CH'], axis=0)
x = x.groupby(df['CP']).mean().reindex(cps).fillna(0)

How it works?

Filter the X like columns
mask the values other than 1
multiply by CH to ensure that only values where CH == 1 are considered
groupby CP and agg with mean to calculate percentage
reindex to ensure all CP's are present

Result

        X1        X2
CP                  
10-20  1.0  0.000000
20-30  0.0  0.000000
30-40  1.0  0.666667
40-50  0.0  0.000000

ID	X1	X2	CP	CH
111	1	0	10-20	1
222	1	0	10-20	1
333	0	1	30-40	0
444	1	1	30-40	1
555	0	1	30-40	1

How to calculate percentage value for two binnary columns and one nominal column in DataFrame ine Python Pandas?

1 Answers1

Code

How it works?

Result