I have DataFrame in Python Pandas like below:
data type:
ID - int64
X1 - int64
X2 - int64
CH - int64
ID X1 X2 CP CH 111 1 0 10-20 1 222 1 0 10-20 1 333 0 1 30-40 0 444 1 1 30-40 1 555 0 1 30-40 1
And I need to create new column "COL1" answered on question:
- What percentage of customers had CH = '1' per combination: CP x X1 ='1' and CP x X2 = '1'
So as a result I need something like below:
col_X | col_CP | CH_perc |
---|---|---|
X1 | 10-20 | 1.00 <- 2 IDs had X1 = '1' and CP = '10-20' and 1 of them had CH = '1', so 2/2 = 1.00 |
X1 | 20-30 | 0 <- non of IDs had X1 = '1' and CP = '20-30' |
X1 | 30-40 | 1.00 <- 1 ID had X1 = '1' and CP = '30-40' and 1 of them had CH = '1', so 1/1 = 1.00 |
X1 | 40-50 | 0 <- non of IDs had X1 = '1' and CP = '40-50' |
X2 | 10-20 | 0 <- non of IDs had X2 = '1' and CP = '10-20' |
X2 | 20-30 | 0 <- non of IDs had X2 = '1' and CP = '20-30' |
X2 | 30-40 | 0.66** <- 3 IDs had X1 = '1' and CP = '30-40' and 2 of them had CH = '1', so 2/3 = 0.66 |
X2 | 40-50 | 0 <- non of IDs had X2 = '1' and CP = '40-50' |
How can I do that in Python Pandas ?