Given a data frame, I would like to write a function to calculate the weighted average by group for a given set of columns.
For example, I have a data frame
mydf = pd.DataFrame({
'group' : np.array(['A', 'B', 'C', 'D', 'E']*20),
'weight' : np.array(list(range(1,6))*20),
'x1' : np.random.uniform(100, 200, 100),
'x2' : np.random.uniform(200, 300, 100),
'x3' : np.random.uniform(300, 400, 100),
...
'x999': np.random.uniform(99900, 100000, 100),
})
I could use the following method to calculate the weighted average for each variable
wm = mydf \
.groupby('group') \
.apply(
lambda x: pd.Series({
'x1' : np.average(x['x1'], weights=x['weight']),
'x2' : np.average(x['x2'], weights=x['weight']),
'x3' : np.average(x['x3'], weights=x['weight']),
...
'x999' : np.average(x['x999'], weights=x['weight']),
})
) \
.reset_index()
The problem is I need to define a function to compute weighted avg for an arbitrary list of variables ,e.g. ['x11','x12','x16','x77'].
I know I can create a loop to process one variable at a time and then merge the results together, but it would be awfully inefficient. Is there a way to programmatically process a list of variables all at once inside the function?