Shorter way to create new column in groupby() based on previous n rows

Question

I have the following code that for a sorted Pandas data frame, groups by one column, and creates two new columns: one according to the previous 4 rows and current row in the group, and one based on the future row in the group.

data_test = {'nr':[1,1,1,1,1,6,6,6,6,6,6,6],
            'val':[11,12,13,14,15,61,62,63,64,65,66,67]}
df_test = pd.DataFrame (data_test, columns = ['nr','val'])

print (df_test)

hence the following frame:

   nr  val
0    1   11
1    1   12
2    1   13
3    1   14
4    1   15
5    6   61
6    6   62
7    6   63
8    6   64
9    6   65
10   6   66
11   6   67

Now I have to following code which groups by 'nr' and build one column containing for each row previous 4 values of 'val' in the group and the current value. Similarly is build one extra column containing per row the future value of 'val' in the group.

df_test['past4'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(4).fillna(0))
df_test['past3'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(3).fillna(0))
df_test['past2'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(2).fillna(0))
df_test['past1'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(1).fillna(0))
df_test['future'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(-1).fillna(0))
df_test['amounts'] = df_test[['past4', 'past3','past2','past1','val']].values.tolist()
df_test.drop(columns = ['past4', 'past3', 'past2', 'past1'], inplace = True)
df_test

    nr  val future  amounts
0   1   11  12  [0, 0, 0, 0, 11]
1   1   12  13  [0, 0, 0, 11, 12]
2   1   13  14  [0, 0, 11, 12, 13]
3   1   14  15  [0, 11, 12, 13, 14]
4   1   15  0   [11, 12, 13, 14, 15]
5   6   61  62  [0, 0, 0, 0, 61]
6   6   62  63  [0, 0, 0, 61, 62]
7   6   63  64  [0, 0, 61, 62, 63]
8   6   64  65  [0, 61, 62, 63, 64]
9   6   65  66  [61, 62, 63, 64, 65]
10  6   66  67  [62, 63, 64, 65, 66]
11  6   67  0   [63, 64, 65, 66, 67]

I'm sure I should be able to build the one list column called 'amounts' easier, probably one-liner. How can I do this?

jezrael · Answer 1 · 2021-02-09T13:48:51.750

Use custom function for create nested lists like:

def f(x):
    #list comprehension with shift by 4,3,2,1,0
    L = [x['val'].shift(i).fillna(0) for i in range(4, -1, -1)]
    #shifting to another column
    x['future'] = x['val'].shift(-1).fillna(0).astype(int)
    #column filled by lists
    x['amounts'] = pd.Series(np.array(L).astype(int).T.tolist(), index=x.index)
    return (x)

df_test = df_test.groupby(['nr']).apply(f)
print (df_test)
    nr  val  future               amounts
0    1   11      12      [0, 0, 0, 0, 11]
1    1   12      13     [0, 0, 0, 11, 12]
2    1   13      14    [0, 0, 11, 12, 13]
3    1   14      15   [0, 11, 12, 13, 14]
4    1   15       0  [11, 12, 13, 14, 15]
5    6   61      62      [0, 0, 0, 0, 61]
6    6   62      63     [0, 0, 0, 61, 62]
7    6   63      64    [0, 0, 61, 62, 63]
8    6   64      65   [0, 61, 62, 63, 64]
9    6   65      66  [61, 62, 63, 64, 65]
10   6   66      67  [62, 63, 64, 65, 66]
11   6   67       0  [63, 64, 65, 66, 67]

great answer, I was thinking of using `index.repeat` and `reindex(4)` to create a new df and product the two data frames by each unique `nr` and `value` but this is more succint and probably more memory efficient too. — Umar.H, Feb 09 '21 at 16:46

score 1 · Answer 2 · answered Feb 09 '21 at 13:36

Migrating your bloc into a function make the code more modular and lighter

In this specific example we send reversed(range(5)) as shift_values, this represents the list [4, 3, 2, 1, 0]

import pandas as pd

data_test = {'nr':[1,1,1,1,1,6,6,6,6,6,6,6],
            'val':[11,12,13,14,15,61,62,63,64,65,66,67]}
df_test = pd.DataFrame(data_test, columns = ['nr','val'])

def generate_past(df, shift_values):
    serie = pd.DataFrame([df.groupby('nr')['val'].transform(lambda x: x.shift(shift_value).fillna(0)) for shift_value in shift_values])
    return serie.T.values.tolist()
        
df_test['future'] = df_test.groupby(['nr'])['val'].transform(lambda x: x.shift(-1).fillna(0))
df_test['amounts'] = generate_past(df_test, reversed(range(5)))

Pygirl · Answer 3 · 2021-02-09T14:30:48.863

you can try like this (same as jezrael) but without using apply. Not a good approach as I am making new dataframe.

df_new = pd.DataFrame()
for i,grp in df_test.groupby('nr'):
    grp = grp.reset_index(drop=True)
    grp['future'] = pd.Series(grp['val'].shift(-1).fillna(0).astype(int))
    grp['amount'] = pd.Series([grp['val'].shift(i).fillna(0).values[-5:] for i in range(len(grp)-1,-1,-1)])
    df_new = df_new.append(grp)   
df_new.reset_index(drop=True, inplace=True)

df_new:

    nr  val future  amounts
0   1   11  12  [0.0, 0.0, 0.0, 0.0, 11.0]
1   1   12  13  [0.0, 0.0, 0.0, 11.0, 12.0]
2   1   13  14  [0.0, 0.0, 11.0, 12.0, 13.0]
3   1   14  15  [0.0, 11.0, 12.0, 13.0, 14.0]
4   1   15  0   [11, 12, 13, 14, 15]
5   6   61  62  [0.0, 0.0, 0.0, 0.0, 61.0]
6   6   62  63  [0.0, 0.0, 0.0, 61.0, 62.0]
7   6   63  64  [0.0, 0.0, 61.0, 62.0, 63.0]
8   6   64  65  [0.0, 61.0, 62.0, 63.0, 64.0]
9   6   65  66  [61.0, 62.0, 63.0, 64.0, 65.0]
10  6   66  67  [62.0, 63.0, 64.0, 65.0, 66.0]
11  6   67  0   [63, 64, 65, 66, 67]

Shorter way to create new column in groupby() based on previous n rows

3 Answers3