Pandas version 0.25 supports "Named Aggregation" via function agg
and namedtuples
. You need to pass column, aggregator pairs as the doc describes. It also says:
If your aggregation functions require additional arguments, partially apply them with functools.partial().
I would like to apply this principle to get a weighted average (besides a simple count and average). My input table is
import pandas as pd
t = pd.DataFrame({'bucket':['a', 'a', 'b', 'b', 'b'], 'weight': [2, 3, 1, 4, 3],
'qty': [100, 500, 200, 800, 700]})
and my query fails:
import functools
import numpy as np
t.groupby('bucket').agg(
NR= ('bucket', 'count'),
AVG_QTY= ('qty', np.mean),
W_AVG_QTY= ('qty', functools.partial(np.average, weights='weight'))
)
with an error message:
TypeError: 1D weights expected when shapes of a and weights differ.
I assume the problem comes from fixing the parameter to be another column instead of a constant? How can I make this work without the workaround that uses apply
and a lambda expression that returns a Series
?