0

I am having a hard time figuring out how to get "rolling weights" based off of one of my columns, then factor these weights onto another column.

I've tried groupby.rolling.apply (function) on my data but the main problem is just conceptualizing how I'm going to take a running/rolling average of the column I'm going to turn into weights, and then factor this "window" of weights onto another column that isn't rolled.

I'm also purposely setting min_period to 1, so you'll notice my first two rows in each group final output "rwag" mirror the original.

W is the rolling column to derive the weights from. B is the column to apply the rolled weights to. Grouping is only done on column a.

df is already sorted by a and yr.

def wavg(w,x):
    return (x * w).sum() / w.sum()

n=df.groupby(['a1'])[['w']].rolling(window=3,min_periods=1).apply(lambda x:  wavg(df['w'],df['b']))


Input:

id | yr  |   a  |   b    |   w
---------------------------------
0  | 1990 |  a1 |   50   |  3000   
1  | 1991 |  a1 |   40   |  2000   
2  | 1992 |  a1 |   10   |  1000   
3  | 1993 |  a1 |   20   |  8000         
4  | 1990 |  b1 |   10   |  500    
5  | 1991 |  b1 |   20   |  1000   
6  | 1992 |  b1 |   30   |  500    
7  | 1993 |  b1 |   40   |  4000        


Desired output:

id | yr  |   a  |   b  |   rwavg
---------------------------------
 0   1990   a1    50      50
 1   1991   a1    40      40
 2   1992   a1    10      39.96
 3   1993   a1    20      22.72
 4   1990   b1    10      10 
 5   1991   b1    20      20
 6   1992   b1    30      20
 7   1993   b1    40      35.45
cs95
  • 379,657
  • 97
  • 704
  • 746
denutza
  • 15
  • 5

1 Answers1

1

apply with rolling usually have some wired behavior

df['Weight']=df.b*df.w
g=df.groupby(['a']).rolling(window=3,min_periods=1)
g['Weight'].sum()/g['w'].sum()
df['rwavg']=(g['Weight'].sum()/g['w'].sum()).values

Out[277]: 
a    
a1  0    50.000000
    1    46.000000
    2    40.000000
    3    22.727273
b1  4    10.000000
    5    16.666667
    6    20.000000
    7    35.454545
dtype: float64
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Do you know the reason why window size is 3, but the 2nd row is getting calculated? – denutza Dec 26 '18 at 15:43
  • @denutza since you doing with min_period =1 which means less that the window size still calculated – BENY Dec 26 '18 at 15:45
  • yeah, but the first row in "a" is matching, but the second row should also be matching original (since win size=3), but its getting the rolling avg instead --See my desired output.....unless im missing something – denutza Dec 26 '18 at 15:50
  • 1
    I get the meaning now..my desired output was slightly incorrect. – denutza Dec 26 '18 at 16:26