I'm trying to calculate a simple weighted average over multiple columns at once grouped by an index column with some of the column values being NaN.
Below is a sample dataset:
df = pd.DataFrame(np.random.choice([0,1,np.nan], size=(5,5)), columns=list('ABCDE'))
df['F'] = 'XYZ'
df['weight'] = 5 + np.random.sample(5) * 5
A B C D E F weight
0.0 0.0 0.0 NaN NaN XYZ 7.754209
0.0 1.0 1.0 0.0 0.0 XYZ 5.811653
0.0 NaN 1.0 0.0 1.0 XYZ 7.858809
1.0 0.0 1.0 0.0 1.0 XYZ 7.690689
NaN 1.0 0.0 0.0 0.0 XYZ 5.092012
And below is my attempt to calculate the weighted average by excluding the NaN values from both the columns as well as weights.
def weighted_avg(df, index_col, weight_col):
cols = [c for c in df.columns if c not in ['weight_col', 'index_col']]
df.loc[:, cols] = df.loc[:, cols].mul(df[weight_col], axis=0)
agg = df.groupby(df[index_col]).sum()
agg.loc[:, cols] = agg.loc[:, cols].div(agg[weight_col], axis=0)
return agg
weighted_avg(df, 'F', 'weight')
However, I'm getting this error:
TypeError: can't multiply sequence by non-int of type 'float'
I checked the data types and all columns except F
are float
.
This is the expected output:
F A B C D E
XYZ 0.26414542 0.413823896 0.624460453 0 0.587812429
I'm relatively new to Python and did find similar problems while extensively searching SO but still couldn't make this simple solution work.
Any help would be much appreciated. TIA.