0

Assuming that I have a dataframe with the following values:

    name     start    end     description
0    ag       20       30       None
1    bgb      21       111      'a'
2    cdd      31       101      None
3    bgb      17       19       None
4    ag       20       22       None
5    ag       1        65       'avc'

I want to groupby name and then get percent of number of description which is not null for every group name

For the example I show, I expect to see:

    name     percent 
0    ag       33.3      
1    bgb      50
2    cdd      0 

How can I do it ?

user3668129
  • 4,318
  • 6
  • 45
  • 87
  • Does this answer your question? [How to use groupby in pandas to calculate a percentage / proportion total based on a criteria in another column](https://stackoverflow.com/questions/36987829/how-to-use-groupby-in-pandas-to-calculate-a-percentage-proportion-total-based) – some_programmer Mar 15 '20 at 09:40

1 Answers1

3

Aggregate mean of boolean mask created by Series.notna, then multiple by 100 and round if necessary:

df1 = (df['description'].notna()
                        .groupby(df['name'])
                        .mean()
                        .mul(100)
                        .round(2)
                        .reset_index(name='percent'))
print (df1)
  name  percent
0   ag    33.33
1  bgb    50.00
2  cdd     0.00

Alternative with DataFrame.assign:

df1 = (df.assign(percent=df['description'].notna())
         .groupby('name')['percent']
         .mean()
         .mul(100)
         .round(2)
         .reset_index())
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I am getting 100 percents for everything when running your code for the first example that you provided. – David Erickson Mar 15 '20 at 09:47
  • 1
    @DavidErickson - Because `None` are strings, use `df = df.replace({'description':{'None':None}})` before my solution – jezrael Mar 15 '20 at 09:48
  • Got it, makes sense. That is good knowledge to have for future. I'm using pd.read_clipboard(), I'm assuming you are doing the same. – David Erickson Mar 15 '20 at 09:51
  • @DavidErickson - yes, pandas by default parse only NaNs, `None` are parsed by default like strings – jezrael Mar 15 '20 at 09:52