I want to compute a grouped weighted average where the weights are present as rows and the value to be averaged is present as the column name:
df_dict= {1: {0: 10, 1: 15, 2: 50, 3: 10, 4: 44},
4: {0: 12, 1: 15, 2: 48, 3: 12, 4: 63},
7.5: {0: 17, 1: 23, 2: 99, 3: 20, 4: 66},
90: {0: 25, 1: 18, 2: 102, 3: 17, 4: 10},
'grouping': {0: '38', 1: '38', 2: '38', 3: '86', 4: '48'}}
df = pd.DataFrame(df_dict)
display(df)
df.groupby(['grouping']).sum().reset_index()
But I am not sure how to compute the weighted average efficiently. Using numpy Weighted average using numpy.average could be an option. But this would imply df.apply
a custom function which is not well optimized.
Is there a better way?
edit
I.e. is there a better way tan using:
df = df.groupby(['grouping']).sum().reset_index()
display(df)
def wavg(x):
values = x.drop('grouping')
values = values.reset_index()
values.columns = ['value', 'weight']
return np.average(list(values.value),
weights=list(values.weight))
df['wavg_location'] = df.apply(wavg, axis=1)