16

I'm trying to work out how to use the groupby function in pandas to work out the proportions of values per year with a given Yes/No criteria.

For example, I have a dataframe called names:

  Name  Number  Year   Sex Criteria
0  name1     789  1998  Male      N
1  name1     688  1999  Male      N
2  name1     639  2000  Male      N
3  name2     551  1998  Male      Y
4  name2     499  1999  Male      Y

I can use

namesgrouped = names.groupby(["Sex", "Year", "Criteria"]).sum()

to get:

                   Number
Sex    Year      Criteria
Male   1998 N        14507
            Y         2308
       1999 N        14119
            Y         2331

and so on. I would like the 'Number Criteria' column to show the % of the total for each gender and year - so instead of N = 14507 and Y = 2308 for 1998 above I'd have N = 86.27% and Y = 13.73%.

Can anyone advise how to do this?

IanS
  • 15,771
  • 9
  • 60
  • 84
fuzzy_logic_77
  • 387
  • 1
  • 4
  • 14
  • 1
    Possible duplicate of [Pandas percentage of total with groupby](http://stackoverflow.com/questions/23377108/pandas-percentage-of-total-with-groupby) – IanS May 02 '16 at 17:26
  • Unfortunately the example you linked to didn't work for me, as I have an extra layer in by groupby! Does anyone know how to work out the percentage when dealing with a hierarchy like mine? – fuzzy_logic_77 Jul 14 '16 at 21:14

1 Answers1

35

This question is a direct extension of the suggested duplicate. Borrowing from the accepted answer, this will work:

In [46]: namesgrouped.groupby(level=[0, 1]).apply(lambda g: g / g.sum())
Out[46]: 
                      Number
Sex  Year Criteria          
Male 1998 N         0.588806
          Y         0.411194
     1999 N         0.579612
          Y         0.420388
     2000 N         1.000000

Edit: a transform operation might be faster than apply:

namesgrouped / namesgrouped.groupby(level=[0, 1]).transform('sum')
IanS
  • 15,771
  • 9
  • 60
  • 84
  • I there a way to achieve the same without `apply`? – NelsonGon Sep 03 '19 at 04:19
  • 1
    @NelsonGon try something like `namesgrouped.div(namesgrouped.groupby(level=[0, 1]).transform('sum'))` – IanS Sep 03 '19 at 09:16
  • Could you please add this to your answer? I have seen transform used quite a lot but not sure how it works. Could you please elaborate more on what transform does? I see that python and R share so many similar function's but R's transform is AFAIK behaving different to what I see being done for instance here. – NelsonGon Sep 03 '19 at 09:33
  • 1
    @NelsonGon `GroupBy.transform` expands the result of the `groupby` operation to the entire length of the original dataframe. For instance here, instead of having the result of the sum summarized for each group, `transform` means that the (same) sum will be expanded to the entire group. That's how we can divide each group by its sum. – IanS Sep 03 '19 at 10:08
  • Thanks, so how does transforming by sum give us the proportions? Sorry, I tried this hours ago and couldn't quite get the result. – NelsonGon Sep 03 '19 at 10:10