0

I have a dataframe, consisting of daily stock observations, date and PERMNO (Identifier). I want to resample the dataframe to only consist of observations for every 5th trading day for every stock. The dataframe looks something like the below:

[10610 rows x 3 columns]
     PERMNO       date       RET      gret     cumret_5d
0   10001.0 2010-01-04 -0.004856  0.995144           NaN
1   10001.0 2010-01-05 -0.005856  0.994144           NaN
2   10001.0 2010-01-06  0.011780  1.011780           NaN
3   10001.0 2010-01-07 -0.033940  0.966060           NaN
4   10001.0 2010-01-08  0.038150  1.038150  3.888603e-03
5   10001.0 2010-01-11  0.015470  1.015470  2.439321e-02
6   10001.0 2010-01-12 -0.004760  0.995240  2.552256e-02
7   10001.0 2010-01-13 -0.003350  0.996650  1.018706e-02
8   10001.0 2010-01-14 -0.001928  0.998072  4.366128e-02
9   10001.0 2010-01-15 -0.007730  0.992270 -2.462285e-03
10  10002.0 2010-01-05 -0.011690  0.988310           NaN
11  10002.0 2010-01-06  0.011826  1.011826           NaN
12  10002.0 2010-01-07 -0.021420  0.978580           NaN
13  10002.0 2010-01-08  0.004974  1.004974           NaN
14  10002.0 2010-01-11 -0.023760  0.976240 -3.992141e-02
15  10002.0 2010-01-12  0.002028  1.002028 -2.659527e-02
16  10002.0 2010-01-13  0.009780  1.009780 -2.856358e-02
17  10002.0 2010-01-14  0.017380  1.017380  9.953183e-03
18  10002.0 2010-01-15 -0.008865  0.991135 -3.954383e-03
19  10002.0 2010-02-18 -0.006958  0.993042  1.318849e-02

The result I want to produce is:

[10610 rows x 3 columns]
     PERMNO       date       RET      gret     cumret_5d
4   10001.0 2010-01-08  0.038150  1.038150  3.888603e-03
9   10001.0 2010-01-15 -0.007730  0.992270 -2.462285e-03
13  10002.0 2010-01-08  0.004974  1.004974           NaN
18  10002.0 2010-01-15 -0.008865  0.991135 -3.954383e-03

I.e I want to keep observations for dates (2010-01-08), (2010-01-15), (2010-01-22)... continuing up until today. The problem is that not every stock contains the same dates (some may have its first trading day in the middle of a month). Further, every 5th trading day is not continuously every 7th day due to holidays.

I have tried using

crsp_daily = crsp_daily.groupby('PERMNO').resample('5D',on='date')

Which just resulted in an empty dataframe:

Out:
DatetimeIndexResamplerGroupby [freq=<Day>, axis=0, closed=left, label=left, convention=e, origin=start_day]

Any ideas on how to solve this problem?

AcK
  • 2,063
  • 2
  • 20
  • 27
Sebastian
  • 27
  • 5

2 Answers2

0

You could loop through the values of PERMNO and then for each subset use .iloc[::5] to get every 5th row. Then concat each resulting DataFrame together:

dfs = []
for val in crsp_daily['PERMNO'].unique():
    dfs.append(crsp_daily[crsp_daily['PERMNO'] == val].iloc[::5])

result = pd.concat(dfs)
jprebys
  • 2,469
  • 1
  • 11
  • 16
  • Thank you for the reply. The problem is that not all stocks have the same first trading day, so it won’t match with every 5th day from the first day in the dataset. For example Permno = 10002 starts the 5th instead of the 4th – Sebastian Feb 02 '23 at 17:03
0

For future reference, I solved it by:

def remove_nonrebalancing_dates(df,gap):
    count = pd.DataFrame(df.set_index('date').groupby('date'), columns=['date', 'tmp']).reset_index()

    del count['tmp']

    count['index'] = count['index'] + 1

    count = count[(count['index'].isin(range(gap, len(count['index']) + 1, gap)))]

    df = df[(df['date'].isin(count['date']))]

    return df

dataframe with containing only every 5th trading day can then be defined as:

df = remove_nonrebalancing_dates(df,5)
Sebastian
  • 27
  • 5