Set values for a column in a pandas Dataframe using values from a different row

Question

I have a pandas dataframe, df:

df = pd.DataFrame({
         'sku': ['A', 'A', 'B', 'B', 'C', 'C'],
         'date': ['2022-08-29', '2022-08-30', '2022-08-29', 
                  '2022-08-30', '2022-08-29', '2022-08-30'],
         'End': [100, 110, 0, 210, 300, 310]}) ``` 

   sku      date    End
0   A   2022-08-29  100
1   A   2022-08-30  110
2   B   2022-08-29  0
3   B   2022-08-30  210
4   C   2022-08-29  300
5   C   2022-08-30  310

I want today's starting value to be yesterday's end value.

So, I want to create a column 'Start' such as: Start = End, for the same sku, but for the previous date's End value (different row). This should apply only to the rows where there is a previous date for that sku.

The result should look like this:

   sku      date     End  Start
0   A   2022-08-29   100    
1   A   2022-08-30   110   100
2   B   2022-08-29   0  
3   B   2022-08-30   210    0
4   C   2022-08-29   300    
5   C   2022-08-30   310   300

In my real dataset, I have values for 'End' for every date (no missing values, just 0's), so this should apply to every date except the first one. The first date is the same for every sku.

I've tried iterating over the DataFrame using for loops, but it's inefficient and ugly.

Is there a way to do this using a pandas function like assign or apply?

score 0 · Accepted Answer · answered Sep 07 '22 at 14:22

0

IIUC

df['B']=df.groupby('sku')['A'].shift(1).fillna('')
df

    sku     date    A   B
0   A   2022-08-29  100     
1   A   2022-08-30  110     100.0
2   B   2022-08-29  0   
3   B   2022-08-30  210     0.0
4   C   2022-08-29  300     
5   C   2022-08-30  310     300.0

answered Sep 07 '22 at 14:22

Naveed

11,495
2
14
21

Thanks @Naveed! Another alternative to get the same result is using the parameter ```fill_value=''``` in ```.shift()```: ```df['B']=df.groupby('sku')['A'].shift(1, fill_value='') ``` – Martin Clausse Sep 07 '22 at 15:54

score 0 · Answer 2 · answered Sep 07 '22 at 14:23

Do this :

df['B'] = [None] + list(df['A'].values)[:df.shape[0]-1]

output:

  sku        date    A      B
0   A  2022-08-29  100    NaN
1   A  2022-08-30  110  100.0
2   B  2022-08-29    0  110.0
3   B  2022-08-30  210    0.0
4   C  2022-08-29  300  210.0
5   C  2022-08-30  310  300.0

Set values for a column in a pandas Dataframe using values from a different row

2 Answers2