Just an example... I have a 5M row dataframe full of runners and a log of how many miles they ran on any given day. If a runner doesn't run on a day, there is no 0 entry.
Runner | Day | Miles |
---|---|---|
A | 1 | 3 |
A | 3 | 4 |
A | 8 | 4 |
A | 9 | 2 |
I'm trying to get a field for how much they ran in the last 7 calendar days (not including the current row), as well as the next 7 calendar days (including current row).
Runner | Day | Miles | Last 7 | Next 7 |
---|---|---|---|---|
A | 1 | 3 | null | 7 |
A | 3 | 4 | 3 | 10 |
A | 8 | 4 | 7 | 6 |
A | 9 | 2 | 10 | 2 |
I think can get the answer for a single row.
Next 7 for row 1 for example...
df.loc[(df['Runner'] == 'A') & (df['day'] >= 1) & (df['day'] < 1+7)]['Miles'].sum()
I'm just unsure how to apply this type of thing to every row in an efficient way. The methods I've seen so far only utilize data from the same row. Any help?