0

I have a dataframe (df) with 2 columns and 1 index.

Index is datetime index and is in format of 2001-01-30 .... etc and the index is ordered by DATE and there are thousands of identical dates (and is monthly dates). Column A is company name (which corresponds to the date), Column B are share prices for the company names in column A for the date in the Index.

Now there are multiple companies in Column A for each date, and companies do vary over time (so the data is not predictable fully).

I want to create a Column C which lags all the prices which are in B forward to the next date (as per in the index).

A basic .shift() would not work, as I would require all the prices to be shifted based on the condition that the company is still there at the next point in the index.

I want a column C which shifts B forward by 1, and a column D which shifts it back by 1.

I have been stuck on this for a while, somebody very smart please help.

Thanks

1 Answers1

1

Consider the example dataframe df below

np.random.seed([3,1415])
df = pd.concat(dict(
        A=pd.Series(np.random.rand(10), pd.date_range('2016-09-30', periods=10)),
        B=pd.Series(np.random.rand(7), pd.date_range('2016-09-25', periods=7)),
        C=pd.Series(np.random.rand(10), pd.date_range('2016-09-20', periods=10)),
        D=pd.Series(np.random.rand(8), pd.date_range('2016-10-30', periods=8)),
        E=pd.Series(np.random.rand(12), pd.date_range('2016-10-25', periods=12)),
        F=pd.Series(np.random.rand(14), pd.date_range('2016-08-30', periods=14)),

    )).rename_axis(['ColumnA', None]).reset_index('ColumnA', name='ColumnB')

print(df.head(10))

           ColumnA   ColumnB
2016-09-30       A  0.444939
2016-10-01       A  0.407554
2016-10-02       A  0.460148
2016-10-03       A  0.465239
2016-10-04       A  0.462691
2016-10-05       A  0.016545
2016-10-06       A  0.850445
2016-10-07       A  0.817744
2016-10-08       A  0.777962
2016-10-09       A  0.757983

use groupby + shift

d1 = df.set_index('ColumnA', append=True)
g = d1.groupby(level='ColumnA').ColumnB
keys = ['Forward', 'Back']
new_df = d1.join(pd.concat([g.shift(i) for i in [-1, 1]], axis=1, keys=keys))
print(new_df.query('ColumnA == "A"').head(10))

                     ColumnB   Forward      Back
           ColumnA                              
2016-09-30 A        0.444939  0.407554       NaN
2016-10-01 A        0.407554  0.460148  0.444939
2016-10-02 A        0.460148  0.465239  0.407554
2016-10-03 A        0.465239  0.462691  0.460148
2016-10-04 A        0.462691  0.016545  0.465239
2016-10-05 A        0.016545  0.850445  0.462691
2016-10-06 A        0.850445  0.817744  0.016545
2016-10-07 A        0.817744  0.777962  0.850445
2016-10-08 A        0.777962  0.757983  0.817744
2016-10-09 A        0.757983       NaN  0.777962
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • 1
    To whoever down voted this answer: I'd appreciate feedback so I can either improve the answer or delete it. – piRSquared Jan 14 '17 at 17:54
  • IMO there is absolutely no reason for downvoting this answer! – MaxU - stand with Ukraine Jan 14 '17 at 19:59
  • Thank you, I am trying your example but for the first part I am unable to create the dataframe as it says 'list' object is not callable' which I try the rename_axis. Additionally one more thing, currently my data is in the format of All the dates being together with their corresponding company name in Column A, eg for the first 200 entries of dataframe the date is the same with different companies, any idea to change my format to your format? Thanks again for your answer and I look forward to hearing from you piRSquared – MysterioProgrammer91 Jan 14 '17 at 21:05