1

I just want to know how to get the sum of the last 5th values based on id from every rows.

df:

id       values
-----------------
a        5
a        10
a        10
b        2
c        2
d        2
a        5
a        10
a        20
a        10
a        15
a        20

expected df:

id       values       sum(x.tail(5))
-------------------------------------
a        5            NaN
a        10           NaN
a        10           NaN
b        2            NaN
c        2            NaN
d        2            NaN
a        5            NaN
a        10           NaN
a        20           40
a        10           55
a        15           55
a        20           60

For simplicity, I'm trying to find the sum of values from the last 5th rows from every rows with id a only.

I tried to use code df.apply(lambda x: x.tail(5)), but that only showed me last 5 rows from the very last row of the entire df. I want to get the sum of last nth rows from every and each rows. Basically it's like rolling_sum for time series data.

Mike
  • 121
  • 1
  • 1
  • 9

2 Answers2

1

you can calculate the sum of the last 5 as like this:

df["rolling As"] = df[df['id'] == 'a'].rolling(window=5).sum()["values"]

(this includes the current row as one of the 5. not sure if that is what you want)

id  values  rolling As
0   a   5   NaN
1   a   10  NaN
2   a   10  NaN
3   b   2   NaN
4   c   2   NaN
5   d   5   NaN
6   a   10  NaN
7   a   20  55.0
8   a   10  60.0
9   a   10  60.0
10  a   15  65.0
11  a   20  75.0

If you don't want it included. you can shift

df["rolling"] = df[df['id'] == 'a'].rolling(window=5).sum()["values"].shift()

to give:

id  values  rolling
0   a   5   NaN
1   a   10  NaN
2   a   10  NaN
3   b   2   NaN
4   c   2   NaN
5   d   5   NaN
6   a   10  NaN
7   a   20  NaN
8   a   10  55.0
9   a   10  60.0
10  a   15  60.0
11  a   20  65.0
Christian Sloper
  • 7,440
  • 3
  • 15
  • 28
1

Try using groupby, transform, and rolling:

df['sum(x.tail(5))'] = df.groupby('id')['values']\
                         .transform(lambda x: x.rolling(5, min_periods=5).sum().shift())

Output:

   id values  sum(x.tail(5))
1   a      5             NaN
2   a     10             NaN
3   a     10             NaN
4   b      2             NaN
5   c      2             NaN
6   d      2             NaN
7   a      5             NaN
8   a     10             NaN
9   a     20            40.0
10  a     10            55.0
11  a     15            55.0
12  a     20            60.0
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • Perfect. Thank you – Mike Oct 29 '18 at 21:30
  • @Mike Christian solutions well if you only looking for a. If b or c has more than 5 entries, you'd not got the tailing sum. This solution would calculate the trailing sum for all ids. – Scott Boston Oct 29 '18 at 21:33