2

I have aggregated series from data frame using 'groupby' function as such code.

df.groupby(['username', 'week'])['variable'].agg(sum)
username     week     
zzzzoooeeeee 0     1.0
             1     NaN
             2     NaN
             3     NaN
             4     NaN

My original dataframe should look like this.

username     week    colA   colB        
zzzzoooeeeee  0       10     a   
zzzzoooeeeee  0       15     b   
zzzzoooeeeee  0       17     t    
zzzzoooeeeee  1       7      t   
zzzzoooeeeee  1       12     v   
zzzzoooeeeee  2       8      z   
zzzzoooeeeee  3       2      o   
zzzzoooeeeee  4       4      p   

I'd like to merge my original data frame with the variable in the third column of the series based on 'username' and 'week'.

The output after merging should be like

username     week    colA   colB        
zzzzoooeeeee  0       10     a      1.0
zzzzoooeeeee  0       15     b      1.0
zzzzoooeeeee  0       17     t      1.0
zzzzoooeeeee  1       7      t     NaN
zzzzoooeeeee  1       12     v     NaN
zzzzoooeeeee  2       8      z     NaN
zzzzoooeeeee  3       2      o     NaN
zzzzoooeeeee  4       4      p     NaN

Can anyone please help this?

B.Lee
  • 71
  • 6
  • 1
    could you share first 5-10 rows of `df` and the final outcome you want? –  Jan 18 '22 at 04:25

1 Answers1

1

Using your df example, which does not have 'variable' column so using 'colA' instead, we can achieve what you want using merge:

df2 = df.groupby(['username', 'week'])['colA'].agg(sum).rename('colA_sum').reset_index()
df.merge(df2, on = ['username','week'])

output

    username        week    colA  colB      colA_sum
--  ------------  ------  ------  ------  ----------
 0  zzzzoooeeeee       0      10  a               42
 1  zzzzoooeeeee       0      15  b               42
 2  zzzzoooeeeee       0      17  t               42
 3  zzzzoooeeeee       1       7  t               19
 4  zzzzoooeeeee       1      12  v               19
 5  zzzzoooeeeee       2       8  z                8
 6  zzzzoooeeeee       3       2  o                2
 7  zzzzoooeeeee       4       4  p                4

The same, by the way, is achievable without merge:

df['colA_sum'] = df.groupby(['username', 'week'])['colA'].transform(sum)
piterbarg
  • 8,089
  • 2
  • 6
  • 22