If you are looking for example e.g. https://pbpython.com/weighted-average.html is quite useful.
To make my remark a bit more practical. First some data:
a = np.random.rand(8,2)
df = pd.DataFrame(a, columns=['Dummy_1', 'Dummy_2'])
df['NotionalTraded'] = [1,0,1,0.5,1,0,1,0.5]
df['index'] = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c']
Dummy_1 Dummy_2 NotionalTraded index
a 0.793964 0.027435 1.0 a
a 0.719204 0.426740 0.0 a
a 0.821932 0.875693 1.0 a
a 0.007710 0.447273 0.5 a
b 0.138741 0.779036 1.0 b
b 0.936195 0.208430 0.0 b
c 0.718698 0.312217 1.0 c
c 0.508001 0.731941 0.5 c
For something like a normal mean you can do something like
In [74]: df.groupby('index')[['Dummy_1', 'Dummy_2']].agg(np.mean)
Out[74]:
Dummy_1 Dummy_2
index
a 0.585703 0.444285
b 0.537468 0.493733
c 0.613349 0.522079
The function in agg
gets applied to each column for each group in succession. However, if you have a function with multiple parameters, you only get an error.
In [75]: def wn(x, weights):
...: pass
...:
In [76]: df.groupby('index')[['Dummy_1', 'Dummy_2']].agg(wn)
TypeError: wn() missing 1 required positional argument: 'weights'
pandas will forward additional arguments to the aggregation function, but in you case it is really hard. The reason is, that the columns are actually filtered out in the [['Dummy_1', 'Dummy_2']]
and agg
will iterate over the columns of each group, so that you can't really write functions which use multiple columns.
There is another function called apply
which is much better suited for you usecase. For each group it will take the whole DataFrame at once and pass it to the function. The solution in the blog article above translates to
In [79]: df.groupby('index').apply(lambda x: np.average(x[['Dummy_1', 'Dummy_2']], weights=x['NotionalTraded'], axis=0))
Out[79]:
index
a [0.6479006083188169, 0.45070588200164324]
b [0.13874076610128128, 0.779035685092115]
c [0.6484653712088316, 0.45212507743936053]
dtype: object
or as you mentioned to avoid lambdas
In [80]: def wn(x, weight_col, used_col):
...: return np.average(x[used_col], weights=x[weight_col], axis=0)
...:
In [81]: df.groupby('index').apply(wn, 'NotionalTraded', ['Dummy_1', 'Dummy_2'])
Out[81]:
index
a [0.6479006083188169, 0.45070588200164324]
b [0.13874076610128128, 0.779035685092115]
c [0.6484653712088316, 0.45212507743936053]
dtype: object
In this case apply forwards the args
to the aggregation function. If you don't even want to specify this, you can create a "dummy" function with the args already "filled out". One way is
In [87]: wn_by_notionalTraded = functools.partial(wn, weight_col='NotionalTraded', used_col=['Dummy_1', 'Dummy_2'])
In [88]: df.groupby('index').apply(wn_by_notionalTraded)
Out[88]:
index
a [0.6479006083188169, 0.45070588200164324]
b [0.13874076610128128, 0.779035685092115]
c [0.6484653712088316, 0.45212507743936053]
dtype: object
This achieves the same effect you would with a factory function
In [90]: def wn_factory(weight_col, used_col):
...: return lambda x: np.average(x[used_col], weights=x[weight_col], axis=0)
...:
...:
In [91]: wn = wn_factory('NotionalTraded', ['Dummy_1', 'Dummy_2'])
In [92]: df.groupby('index').apply(wn)
Out[92]:
index
a [0.6479006083188169, 0.45070588200164324]
b [0.13874076610128128, 0.779035685092115]
c [0.6484653712088316, 0.45212507743936053]
dtype: object
The thoughts that went into your solution are basically the same. The main challenge is that a weighted average by definition requires the weights and agg
assumes you can aggregate each column independently.