0

I have a dataframe as below:

d = {'col1': ["A", "A", "A", "B", "B", "B"], 'col2': [2015, 2016, 2017, 2015, 2016, 2017], 'col3': [10, 20, 25, 10, 12, 14]}

I would like to get the difference of col3 by col1 and col2. Such that, col1 represent companies, while col2 years and col3 stock price. I am trying to get the stock price difference each year. Such that, the output should be:

d2= {'col4': ['nan', 10, 5, 'nan', 2, 4]} 

Thank you for your suggestions in advance.

Note: we cannot re-index the dataframe, we have col5, col6, col7... etc.. many other columns.

hilo
  • 116
  • 11

1 Answers1

2

groupby + diff:

df.groupby('col1').col3.transform('diff')

0     NaN
1    10.0
2     5.0
3     NaN
4     2.0
5     2.0
Name: col3, dtype: float64
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • Hi Psidom, thanks for your quick reply. we cannot re-index the dataframe, we have col5, col6, col7... etc.. many other columns. Do you have any comment on this? – hilo Jan 13 '22 at 23:44
  • 2
    What do you mean cannot re-index ? `df['col4'] = df.groupby('col1').col3.transform('diff')` should automatically align the index for you. – Psidom Jan 13 '22 at 23:46
  • 1
    Thanks. I assumed wrong, once I executed it gave the desired results. thanks for sparing time and such a nice simple approach. kudos! – hilo Jan 14 '22 at 01:02