Find Percentage of each class for every ID

Question

I am working with a data frame that has 20 ids and for each Id, there are about 10-15 stores, and each store is assigned a status (Zero, Negative and Positive).

Data:

data = 
ID      STORE  STATUS           
100001  1      zero 
100001  2      positive 
100001  3      zero 
100001  4      negative     
100001  5      zero    
100001  6      zero 
100001  7      positive 
100001  8      negative 
....

Problem:

I would like to find out what percentage of stores are zeros/ positive/negative for each id.

My attempt:

zero = 0

for item in data['ID'].unique():
    items = data[data['ID']==item]
    lenght = len(items)
    print(lenght)
    for index, i in items.iterrows():
        if i['STATUS'] == 'Zero':
            zero += 1
            pct = zero/lenght
    items['PCT'] = zero/lenght
    print(items)

My result from code above:

ID      STORE  STATUS.    PCT           
100001  1      zero       0.5
100001  2      positive   0.5
100001  3      zero       0.5
100001  4      negative   0.5
100001  5      zero       0.5
100001  6      zero       0.5
100001  7      positive   0.5
100001  8      negative   0.5

Expected result :

ID      STORE  STATUS.    PCT           
100001  1      zero       0.5
100001  2      positive   0.25
100001  3      zero       0.5
100001  4      negative   0.25
100001  5      zero       0.5
100001  6      zero       0.5
100001  7      positive   0.25
100001  8      negative   0.25

If you're using explicit loops with pands, you're often doing something wrong. Use its built-in grouping methods. — Barmar, Feb 10 '22 at 18:01
I wonder if this is what you are looking for: https://stackoverflow.com/questions/50169311/getting-percentage-and-count-python — Mark, Feb 10 '22 at 18:03

score 1 · Answer 1 · answered Feb 10 '22 at 18:12

IIUC, you want to transform the share of each status across stores. Then you can groupby "ID" and "Status" and count how many stores are in each group and transform it for the column and divide it by the length:

df['PCT'] = df.groupby(['ID','STATUS'])['STORE'].transform('count') / df.groupby('ID')['ID'].transform('count')

Output:

       ID  STORE    STATUS   PCT
0  100001      1      zero  0.50
1  100001      2  positive  0.25
2  100001      3      zero  0.50
3  100001      4  negative  0.25
4  100001      5      zero  0.50
5  100001      6      zero  0.50
6  100001      7  positive  0.25
7  100001      8  negative  0.25

Find Percentage of each class for every ID

1 Answers1