2

I have a data frame like this,

df
col1    col2
  A       1
  B       3
  C       2
  D       5
  E       6
  F       8
  G       10

I want to add previous and next n values of a particular value of col2 and store it into a new column,

So, If n=2, then the data frame should look like,

 col1    col2    col3
  A       1       6  (only below 2 values are there no upper values, so adding 3 numbers)
  B       3      11 (adding one prev, current and next two)
  C       2      17(adding all 4 values)
  D       5      24(same as above)
  E       6      31(same as above)
  F       8      29(adding two prev and next one as only one is present)
  G       10     24(adding with only prev two values)

When previous or next 2 values are not found adding whatever values are available. I can do it using a for loop, but the execution time will be huge, looking for some pandas shortcuts do do it most efficiently.

smci
  • 32,567
  • 20
  • 113
  • 146
Kallol
  • 2,089
  • 3
  • 18
  • 33
  • whats the formula to get col3 value please? I am unable to understand the logic for `5, 11, 17, 24.... ` – Joe Ferndz Oct 10 '20 at 19:48
  • @JoeFerndz , 5=1+3+2 (as only 3 values are present) 11=1+3+2+5 17=1+3+2+5+6 24=3+2+5+6+8 – Kallol Oct 10 '20 at 19:49
  • @Kallol can you give mutliple examples for each column, including where it starts to decrease in column 3. Also, an you double-check the numbers. 5 does not equal 1 + 3 + 2 as in your first example. – David Erickson Oct 10 '20 at 19:51
  • 6 = (1 + 3 values (1 + 3 + 2)) = understood. 11 = 6 + 2 values (1 + 3). Shouldnt it be 10 instead of 11. Next 17 = 11 + 2 values (1 + 3) = Shouldnt it be 16? Is this what you are trying to do? I guess these numbers got mixed up coz you started with 5 and then changed first row to 6. Am I getting the logic right? – Joe Ferndz Oct 10 '20 at 19:54
  • @DavidErickson explained the logic, check the edited one – Kallol Oct 10 '20 at 19:54

1 Answers1

5

You can use the rolling method.

import pandas as pd
df = pd.read_json('{"col1":{"0":"A","1":"B","2":"C","3":"D","4":"E","5":"F","6":"G"},"col2":{"0":1,"1":3,"2":2,"3":5,"4":6,"5":8,"6":10}}')

df['col3'] = df['col2'].rolling(5, center=True, min_periods=0).sum()
col1    col2    col3
0   A   1   6.0
1   B   3   11.0
2   C   2   17.0
3   D   5   24.0
4   E   6   31.0
5   F   8   29.0
6   G   10  24.0
nocibambi
  • 2,065
  • 1
  • 16
  • 22
  • Yes my solution would have been `df['col3'] = (df['col2'].shift(2).fillna(0) + df['col2'].shift(1).fillna(0) + df['col2'] + df['col2'].shift(-1).fillna(0) + df['col2'].shift(-2).fillna(0))` ... this is much better – David Erickson Oct 10 '20 at 20:04
  • Thank You for sharing the answer. I didn't understand. Will try to learn more about rolling to figure out what's happening here. – Joe Ferndz Oct 10 '20 at 20:06
  • 1
    @Joe Ferndz, `rolling(5)` takes the previous 4 values plus current row... however, the `center=True` parameter means that the middle value is at current row... so it would grab previous 2 values, current, and next 2 values making this an ideal solution. – David Erickson Oct 10 '20 at 20:09