-1

I would like to create a running dataframe of trading data for the next four hours from the current time while skipping non-trading hours (5-6pm weekdays, Saturday-6pm Sunday). For example, at 4pm on Friday, I'd like a dataframe that runs from 4pm to 5pm on Friday and then 6pm-9pm on Sunday.

Currently, I am using the following:

time_parameter = pd.Timedelta(hours=4) #Set time difference to four hours
df = df.set_index(['Time']) 

for current_time, row in df.iterrows(): #df is the entire trading data df
    future_time = current_time + time_parameter
    temp_df = df.loc[current_time : future_time]

This obviously doesn't skip non-trading hours so I am trying to find an efficient way to do that.

One method I can use is creating a set of non-trading hours, checking if the current time bounds (current_time:future_time) include any, and adding an additional hour for each.

However, since the dataset has about 3.5million rows and would need this check for each row, I want to ask if anyone may know of a faster approach?

In short, looking for a method to add 4 business hours (Sun-Fri 6pm-5pm) to current time. Thanks!

Input Data: This shows the first 19 rows of the trading data

Expected Output Data: This shows the first and last 3 rows from a four hour period starting at 18:00:30 on January 8th, 2017


Solution

Based on the answer by Code Different below, I used the following:

def last_trading_hour(start_time, time_parameter, periods_parameter):
    start_series = pd.date_range(start_time, freq='H', periods = periods_parameter)
    mask = (((start_series.dayofweek == 6) & (time_2(18) <= start_series.time)) #Sunday: After 6pm
    | ((start_series.dayofweek == 4) & (start_series.time < time_2(17))) #Friday before 5pm
    | ((start_series.dayofweek < 4) & (start_series.time < time_2(17))) #Mon-Thur before 5pm
    | ((start_series.dayofweek < 4) & (time_2(18) <= start_series.time)) #Mon-Thur after 6pm
    )

    return start_series[mask][time_parameter]

start_time = pd.Timestamp('2019-08-16 13:00:10')
time_parameter = 4 #Adding 4 hours to time
periods_parameter = 49 + time_parameter #Max 49 straight hours of no-trades (Fri 5pm-Sun 6pm)
last_trading_hour(start_time, time_parameter, periods_parameter)

Results:

Timestamp('2019-08-18 18:00:10')

If you need the entire series, follow Code Different's method for indexing.

  • Welcome to StackOverflow. Please follow the posting guidelines in the help documentation, as suggested when you created this account. [On topic](https://stackoverflow.com/help/on-topic), [how to ask](https://stackoverflow.com/help/how-to-ask), and ... [the perfect question](https://codeblog.jonskeet.uk/2010/08/29/writing-the-perfect-question/) apply here. StackOverflow is not a design, coding, research, or tutorial resource. Show us your input, output, and a coding attempt? – Prune Aug 21 '19 at 18:21

2 Answers2

0

It's hard to tell from so little information. However, it seems that you're working on hour boundaries. If so, it should be straightforward to set up a look-up table (dict) keyed by each day and hour, perhaps: (0,0) for midnight Sun/Mon, (2, 13) for 1pm Wed, and so on. Then provide simple entries for the end of the 4-hour period

(0, 0): Timedelta(hours= 4),    #  0:00 Mon, normal span; regular trading hours
(0,16): Timedelta(hours= 5),    # 16:00 Sun; 1 hour of down-time
(4,16): Timedelta(hours=53),    # 16:00 Fri; 1 hour trade, 49 hrs down, 3 hrs trade
(5,16): Timedelta(hours=26),    # 16:00 Sat; 26 hours down, 4 hours trade

Add the indicated Timedelta to your start time; that gives you the end time of the period. You can write a few loops and if statements to compute these times for you, or just hard-code all 168; they're rather repetitive.

Checking your data base lines remains up to you, since you didn't specify their format or semantics in your posting.

Prune
  • 76,765
  • 14
  • 60
  • 81
  • Hey! Thanks for your response. What additional information would have been useful to include in my post? – panthpatel Aug 21 '19 at 19:32
  • Please refer to the general instructions in my comment to the question, and the details of the data I mentioned in passing in this answer. – Prune Aug 21 '19 at 20:48
0

Generate a sufficiently long series of hours then filter for the first 4 that are trading hours:

from datetime import time

start_time = pd.Timestamp('2019-08-16 16:00')
s = pd.date_range(start_time, freq='H', periods=72)

is_trading_hour = (
    ((s.weekday == 6) & (time(18) <= s.time))
    | ((s.weekday == 4) & (s.time < time(17)))
    | (s.weekday < 4)
)

s[is_trading_hour][:4]

Result:

DatetimeIndex(['2019-08-16 16:00:00', '2019-08-18 18:00:00',
               '2019-08-18 19:00:00', '2019-08-18 20:00:00'],
              dtype='datetime64[ns]', freq=None)
Code Different
  • 90,614
  • 16
  • 144
  • 163
  • Thanks! This was very helpful. I altered it a bit for my specific purposes and have added that above in case anyone else has a similar question. – panthpatel Aug 22 '19 at 01:40