Please see my example below, how can I return the data from the groupby on all 3 levels of the original MultiIndex?
In this example: I want to see the totals by brand. I have now applied a workaround using map (see below, this shows the output that I hope to get straight from the groupby).
brands = ['Tesla','Tesla','Tesla','Peugeot', 'Peugeot', 'Citroen', 'Opel', 'Opel', 'Peugeot', 'Citroen', 'Opel']
years = [2018, 2017,2016, 2018, 2017, 2017, 2018, 2017,2016, 2016,2016]
owners = ['Tesla','Tesla','Tesla','PSA', 'PSA', 'PSA', 'PSA', 'PSA','PSA', 'PSA', 'PSA']
index = pd.MultiIndex.from_arrays([owners, years, brands], names=['owner', 'year', 'brand'])
data = np.random.randint(low=100, high=1000, size=len(index), dtype=int)
weight = np.random.randint(low=1, high=10, size=len(index), dtype=int)
df = pd.DataFrame({'data': data, 'weight': weight},index=index)
df.loc[('PSA', 2017, 'Opel'), 'data'] = np.nan
df.loc[('PSA', 2016, 'Opel'), 'data'] = np.nan
df.loc[('PSA', 2016, 'Citroen'), 'data'] = np.nan
df.loc[('Tesla', 2016, 'Tesla'), 'data'] = np.nan
out:
data weight
owner year brand
PSA 2016 Citroen NaN 5
Opel NaN 5
Peugeot 250.0 2
2017 Citroen 469.0 4
Opel NaN 5
Peugeot 768.0 5
2018 Opel 237.0 6
Peugeot 663.0 4
Tesla 2016 Tesla NaN 3
2017 Tesla 695.0 6
2018 Tesla 371.0 5
I have tried with the index and "level" as well as with columns and "by". And I have tried with "as_index = False" .sum() as well as with "group_keys()" = False and .apply(sum). But I am not able to get the brand column back in the groupby output:
grouped = df.groupby(level=['owner', 'year'], group_keys=False) #type: <class 'pandas.core.groupby.generic.DataFrameGroupBy'>
grouped.apply(sum)
out:
data weight group_data
owner year
PSA 2016 250.0 12.0 750.0
2017 1237.0 14.0 3711.0
2018 900.0 10.0 1800.0
Tesla 2016 0.0 3.0 0.0
2017 695.0 6.0 695.0
2018 371.0 5.0 371.0
Similar:
grouped = df.groupby(by=['owner', 'year'], group_keys=False) #type: <class 'pandas.core.groupby.generic.DataFrameGroupBy'>
grouped.apply(sum)
or:
grouped = df.groupby(by=['owner', 'year'], as_index=False, group_keys=False) #type: <class 'pandas.core.groupby.generic.DataFrameGroupBy'>
grouped.sum()
Workaround:
grouped = df.groupby(level=['owner', 'year'], group_keys=False) #type: <class 'pandas.core.groupby.generic.DataFrameGroupBy'>
df_owner_year = grouped.apply(sum)
s_data = df_owner_year['data']
df['group_data'] = df.index.map(s_data)
df
out:
data weight group_data
owner year brand
PSA 2016 Citroen NaN 5 250.0
Opel NaN 5 250.0
Peugeot 250.0 2 250.0
2017 Citroen 469.0 4 1237.0
Opel NaN 5 1237.0
Peugeot 768.0 5 1237.0
2018 Opel 237.0 6 900.0
Peugeot 663.0 4 900.0
Tesla 2016 Tesla NaN 3 0.0
2017 Tesla 695.0 6 695.0
2018 Tesla 371.0 5 371.0