1

I want to create a table of grouped percentages across multiple columns using value_counts(normalize=True). I want to do so similar to below, except adding in the "normalize=True" portion, but not sure how to do so using groupby.agg. Any thoughts on how to do so without adding many more lines of code? My real data has a ton of columns, with a scale of 1-5 in each except the grouping columns.

Example below:

df = pd.DataFrame({'Country': ['FR', 'FR', 'GER','GER'],
               'Foo': ['1', '2', '3', '1'],
               'Bar': ['5', '5', '3', '1'],
               'Baz': ['5', '1', '3', '4']})

df2=df.groupby('Country').agg(
{
'Foo': 'value_counts' ,
'Bar': 'value_counts',
'Baz': 'value_counts'
}
      )
  • what's the desired output – U13-Forward Aug 31 '21 at 04:10
  • Matrix table with percentage of each value in each column in the count (so in this case, Country, then 1-5, then % of grouped total for Foo, Bar, and Baz) – David Himmel Aug 31 '21 at 04:20
  • Please post a dataframe example of the output – U13-Forward Aug 31 '21 at 04:21
  • Actually ended up figuring it out. import pandas as pd df = pd.DataFrame({'Country': ['FR', 'FR', 'GER','GER'], 'Foo': ['1', '2', '3', '1'], 'Bar': ['5', '5', '3', '1'], 'Baz': ['5', '1', '3', '4']}) f=lambda x: x/x.sum() df2=df.groupby('Country').agg( { 'Foo': 'value_counts' , 'Bar': 'value_counts', 'Baz': 'value_counts' } ).groupby(level=0).apply(f).sort_index df2 – David Himmel Sep 01 '21 at 16:12

1 Answers1

0

EDIT: Figured it out! Will close.

import pandas as pd


df = pd.DataFrame({'Country': ['FR', 'FR', 'GER','GER'],
               'Foo': ['1', '2', '3', '1'],
               'Bar': ['5', '5', '3', '1'],
               'Baz': ['5', '1', '3', '4']})

f=lambda x: x/x.sum()

df2=df.groupby('Country').agg(
{
'Foo': 'value_counts' ,
'Bar': 'value_counts',
'Baz': 'value_counts'
}
      ).groupby(level=0).apply(f).sort_index

df2