I am currently working with a panel data of financial information on pandas, therefore working with different companies across different years. I am trying to generate a column of the $ invested shifted by 2 time periods. Hence, reporting the value of time t also at t+2.
Normally, to lag a variable, I would use df.groupby('tic')['investments'].shift(2)
, however unfortunately my data is a bit spotty and therefore for the same company I might have a gap in the years. Just to give you an idea, here is an example of my df:
datadate fyear tic invest
0 31/12/1998 1997 AAPL 12.3
1 31/12/1999 1998 AAPL 14.5
2 31/12/2002 2002 AAPL 9.7
3 31/12/2003 2003 AAPL 21.8
4 31/12/2004 2004 AAPL 21.4
5 31/12/2005 2005 AAPL 18.9
6 31/05/2008 2008 TSLA 11.5
7 31/05/2009 2009 TSLA 13.7
8 31/05/2010 2010 TSLA 19.4
9 31/05/2011 2011 TSLA 14.5
10 31/05/2014 2013 TSLA 14.8
.. ... .. .. ..
Therefore, what I am trying to achieve is the following:
datadate fyear tic invest l2_invest
0 31/12/1998 1997 AAPL 12.3 NaN
1 31/12/1999 1998 AAPL 14.5 NaN
2 31/12/2002 2002 AAPL 9.7 NaN
3 31/12/2003 2003 AAPL 21.8 NaN
4 31/12/2004 2004 AAPL 21.4 9.7
5 31/12/2005 2005 AAPL 18.9 21.8
6 31/05/2008 2008 TSLA 11.5 NaN
7 31/05/2009 2009 TSLA 13.7 NaN
8 31/05/2010 2010 TSLA 19.4 11.5
9 31/05/2011 2011 TSLA 14.5 13.7
10 31/05/2014 2013 TSLA 14.8 19.4
.. ... .. .. ..
Thank you in advance for your help :)