1

I have the timeseries of Euro-US Dollar Exchange Rate at minute granularity spanning the entire 2015 year, including non-trading days (ex.weekends) where the timeseries value get repeated for the entire non-trading period.

I need to discard such periods by selecting only the data between Sunday 23:00 pm and Friday 23:00 pm.

I haven't found a solution yet for Pandas (I know how to select between times inside a day and select between days). I could simply shift the time by 1h and then select only the business days but this is a sub-optimal solution.

Any idea on how to achieve this?

Example of data:

Local time,              Open,    High,    Low,     Close,   Volume
02.01.2015 22:58:00.000, 1.20008, 1.20016, 1.20006, 1.20009, 119.84
02.01.2015 22:59:00.000, 1.20009, 1.20018, 1.20004, 1.20017, 40.61
02.01.2015 23:00:00.000, 1.20017, 1.20017, 1.20017, 1.20017, 0
02.01.2015 23:01:00.000, 1.20017, 1.20017, 1.20017, 1.20017, 0
...
04.01.2015 22:58:00.000, 1.20017, 1.20017, 1.20017, 1.20017, 0
04.01.2015 22:59:00.000, 1.20017, 1.20017, 1.20017, 1.20017, 0
04.01.2015 23:00:00.000, 1.19495, 1.19506, 1.19358, 1.19410, 109.4
04.01.2015 23:01:00.000, 1.19408, 1.19414, 1.19052, 1.19123, 108.12
...
SergeGardien
  • 137
  • 2
  • 11

1 Answers1

1

consider the pd.DataFrame df and pd.tseries.index.DatetimeIndex tidx

tidx = pd.date_range('2010-01-01', '2011-01-01', freq='H')
df = pd.DataFrame(np.ones((tidx.shape[0], 2)), tidx, columns=list('AB'))

we can construct a series of values for which to filter

day_hour = (((tidx.weekday + 1) % 7) * 100) + tidx.hour

determine which values are prior to Friday 23:00

before_friday = day_hour <= 523

And after Sunday 23:00

after_sunday = day_hour >= 23

Filter our df based on above conditions

df[before_friday & after_sunday]
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Thanks piRSquared, this seems a good solution but it is for hourly data so I've modified the following: – SergeGardien Oct 14 '16 at 09:30
  • tidx = pd.date_range('2015-01-01', '2016-01-01', freq='1min') before_friday = day_hour <= 522 – SergeGardien Oct 14 '16 at 09:35
  • This works with your dummy array df2 = pd.DataFrame(np.ones((tidx.shape[0], 2)), tidx, columns=list('AB')) but on my original array I have the following problem: "ValueError: Item wrong length 525601 instead of 525540." – SergeGardien Oct 14 '16 at 09:36
  • Apologies for the multiple comments but I'm still a StackOverflow newbie (pressed enter to go next line and instead it posted the comment) and it doesn't provide me with an edit button. – SergeGardien Oct 14 '16 at 09:44
  • I'm going to be honest, I don't mind multiple comments. – piRSquared Oct 14 '16 at 13:02
  • I further modified your code to match the length of the 2 arrays (525601 vs 525540) with the following command: tidx = pd.date_range('2015-01-01 00:00:00', '2015-12-31 22:59:00', freq='1min'). However I've realized that the final code isn't working properly, I checked the month of May and the week-end values don't get filtered properly. Any idea on how to fix this? – SergeGardien Oct 14 '16 at 14:09
  • Fixed by using the index of my dataframe with df.index = tidx – SergeGardien Oct 14 '16 at 17:09