I would like to create a running dataframe of trading data for the next four hours from the current time while skipping non-trading hours (5-6pm weekdays, Saturday-6pm Sunday). For example, at 4pm on Friday, I'd like a dataframe that runs from 4pm to 5pm on Friday and then 6pm-9pm on Sunday.
Currently, I am using the following:
time_parameter = pd.Timedelta(hours=4) #Set time difference to four hours
df = df.set_index(['Time'])
for current_time, row in df.iterrows(): #df is the entire trading data df
future_time = current_time + time_parameter
temp_df = df.loc[current_time : future_time]
This obviously doesn't skip non-trading hours so I am trying to find an efficient way to do that.
One method I can use is creating a set of non-trading hours, checking if the current time bounds (current_time:future_time) include any, and adding an additional hour for each.
However, since the dataset has about 3.5million rows and would need this check for each row, I want to ask if anyone may know of a faster approach?
In short, looking for a method to add 4 business hours (Sun-Fri 6pm-5pm) to current time. Thanks!
Input Data: This shows the first 19 rows of the trading data
Solution
Based on the answer by Code Different below, I used the following:
def last_trading_hour(start_time, time_parameter, periods_parameter):
start_series = pd.date_range(start_time, freq='H', periods = periods_parameter)
mask = (((start_series.dayofweek == 6) & (time_2(18) <= start_series.time)) #Sunday: After 6pm
| ((start_series.dayofweek == 4) & (start_series.time < time_2(17))) #Friday before 5pm
| ((start_series.dayofweek < 4) & (start_series.time < time_2(17))) #Mon-Thur before 5pm
| ((start_series.dayofweek < 4) & (time_2(18) <= start_series.time)) #Mon-Thur after 6pm
)
return start_series[mask][time_parameter]
start_time = pd.Timestamp('2019-08-16 13:00:10')
time_parameter = 4 #Adding 4 hours to time
periods_parameter = 49 + time_parameter #Max 49 straight hours of no-trades (Fri 5pm-Sun 6pm)
last_trading_hour(start_time, time_parameter, periods_parameter)
Results:
Timestamp('2019-08-18 18:00:10')
If you need the entire series, follow Code Different's method for indexing.