0

Hi I have a panel data set looks like

stock    date     time   spread1  weight  spread2 
VOD      01-01    9:05    0.01    0.03     ...
VOD      01-01    9.12    0.03    0.05     ...
VOD      01-01   10.04    0.02    0.30     ...
VOD      01-02   11.04    0.02    0.05
...       ...     ...     ....     ...
BAT      01-01   0.05     0.04    0.03
BAT      01-01   0.07     0.05    0.03
BAT      01-01   0.10     0.06    0.04

I want to calculate the weighted average of spread1 for each stock in each day. I can break the solution into several steps. i.e. I can apply groupby and agg function to get the sum of spread1*weight for each stock in each day in dataframe1, and then calculate the sum of weight for each stock in each day in dataframe2. After that merge two data sets and get weighted average for spread1.

My question is is there any simple way to calculate weighted average of spread1 here ? I also have spread2, spread3 and spread4. So I want to write as fewer code as possible. Thanks

FlyUFalcon
  • 314
  • 1
  • 4
  • 18

1 Answers1

1

IIUC, you need to transform the result back to the original, but using .transform with output that depends on two columns is tricky. We write our own function, where we pass the series of spread s and the original DataFrame df so we can also use the weights:

import numpy as np

def weighted_avg(s, df):
    return np.average(s, weights=df.loc[df.index.isin(s.index), 'weight'])

df['spread1_avg'] = df.groupby(['stock', 'date']).spread1.transform(weighted_avg, df)

Output:

  stock   date   time  spread1  weight  spread1_avg
0   VOD  01-01   9:05     0.01    0.03     0.020526
1   VOD  01-01   9.12     0.03    0.05     0.020526
2   VOD  01-01  10.04     0.02    0.30     0.020526
3   VOD  01-02  11.04     0.02    0.05     0.020000
4   BAT  01-01   0.05     0.04    0.03     0.051000
5   BAT  01-01   0.07     0.05    0.03     0.051000
6   BAT  01-01   0.10     0.06    0.04     0.051000

If needed for multiple columns:

gp = df.groupby(['stock', 'date'])
for col in [f'spread{i}' for i in range(1,5)]:
    df[f'{col}_avg'] = gp[col].transform(weighted_avg, df)

Alternatively, if you don't need to transform back and one want value per stock-date:

def my_avg2(gp):
    avg = np.average(gp.filter(like='spread'), weights=gp.weight, axis=0)
    return pd.Series(avg, index=[col for col in gp.columns if col.startswith('spread')])    

### Create some dummy data
df['spread2'] = df.spread1+1
df['spread3'] = df.spread1+12.1
df['spread4'] = df.spread1+1.13

df.groupby(['stock', 'date'])[['weight'] + [f'spread{i}' for i in range(1,5)]].apply(my_avg2)

#              spread1   spread2    spread3   spread4
#stock date                                          
#BAT   01-01  0.051000  1.051000  12.151000  1.181000
#VOD   01-01  0.020526  1.020526  12.120526  1.150526
#      01-02  0.020000  1.020000  12.120000  1.150000
ALollz
  • 57,915
  • 7
  • 66
  • 89