2

I have a pandas dataframe with high frequency (second or minutely) stock data.

e.g. 
                           Date      Time     Open  ...      Low    Close  Volume
DT                                                  ...                          
2020-10-09 09:30:00  10/09/2020  09:30:00  3432.50  ...  3432.25  3432.50      37
2020-10-09 09:30:01  10/09/2020  09:30:01  3432.50  ...  3432.25  3432.50       8
2020-10-09 09:30:02  10/09/2020  09:30:02  3432.50  ...  3432.25  3432.50      11
2020-10-09 09:30:03  10/09/2020  09:30:03  3432.25  ...  3432.25  3432.50      11
2020-10-09 09:30:04  10/09/2020  09:30:04  3432.25  ...  3432.25  3432.75     134
                        ...       ...      ...  ...      ...      ...     ...
2020-10-14 15:59:53  10/14/2020  15:59:53  3461.25  ...  3461.25  3461.50      59
2020-10-14 15:59:56  10/14/2020  15:59:56  3461.50  ...  3461.25  3461.50      61
2020-10-14 15:59:57  10/14/2020  15:59:57  3461.25  ...  3461.00  3461.75      54
2020-10-14 15:59:58  10/14/2020  15:59:58  3461.50  ...  3461.50  3461.50      25
2020-10-14 15:59:59  10/14/2020  15:59:59  3461.50  ...  3461.50  3461.75      16

I can easily resample this data into minute periods that subdivide evenly into 24 hours (or 1440 minutes/day).

rule = '60T'
 df = df.resample(rule, closed='right', label='right').agg(
    {'Open': 'first',
     'High': 'max',
     'Low': 'min',
     'Close': 'last',
     'Volume': 'sum'
     }).dropna()

How do you do this for a period that doesn't divide evenly into 1440 minutes/day? e.g. rule = '65T'

Typical US stock data goes from 9:30 to 16:00 ET. For the 65 minute bars, I'd want the following period's resampled:

09:30-10:35
10:35-11:40
11:40-12:45
12:45-13:50
13:50-14:55
14:55-16:00
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53
AaronE
  • 53
  • 5

1 Answers1

1

We can resample by setting the optional parameter origin to the starting value in datetime index DT

From the documentation of resample origin is

The timestamp on which to adjust the grouping. The timezone of origin must match the timezone of the index.

So in-order to adjust the grouping to start from 9:30 we set the value of origin to the starting timestamp in the datetime index i.e. 2020-10-09 09:30:00

In addition to this we also need to set the value of optional parameter closed to left and label to left

d = {'Open': 'first', 'High': 'max', 'Low': 'min',
     'Close': 'last', 'Volume': 'sum'}

df.resample(rule='65T', 
            closed='left', label='left',
            origin=df.index.min()).agg(d).dropna()

Sample output (for illustration only)

                       Open    High      Low   Close  Volume
DT                                                          
2020-10-09 09:30:00  3432.5  3432.5  3432.25  3432.5  2405.0 # 09:30-10:35
2020-10-09 10:35:00  3432.5  3432.5  3432.25  3432.5  2405.0 # 10:35-11:40
2020-10-09 11:40:00  3432.5  3432.5  3432.25  3432.5  2405.0 # 11:40-12:45
2020-10-09 12:45:00  3432.5  3432.5  3432.25  3432.5  2405.0 # 12:45-13:50
2020-10-09 13:50:00  3432.5  3432.5  3432.25  3432.5  2405.0 # 13:50-14:55
2020-10-09 14:55:00  3432.5  3432.5  3432.25  3432.5  2405.0 # 14:55-16:00
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53
  • This resamples the df properly for data for that first day. But on the next day it doesn't. The next day shows DT at 09:00, 10:35, etc. – AaronE Apr 24 '21 at 15:59