1

Python newbie here but I have some data that is intra-day financial data, going back to 2012, so it's got the same hours each day(same trading session each day) but just different dates. I want to be able to select certain times out of the data and check the corresponding OHLC data for that period and then do some analysis on it.

So at the moment it's a CSV file, and I'm doing:

import pandas as pd
data = pd.DataFrame.read_csv('data.csv')

date = data['date']
op = data['open']
high = data['high']
low = data['low']
close = data['close']
volume = data['volume']

The thing is that the date column is in the format of "dd/mm/yyyy 00:00:00 "as one string or whatever, so is it possible to still select between a certain time, like between "09:00:00" and "10:00:00"? or do I have to separate that time bit from the date and make it it's own column? If so, how?

So I believe pandas has a between_time() function, but that seems to need a DataFrame, so how can I convert it to a DataFrame, then I should be able to use the between_time function to select between the times I want. Also because there's obviously thousands of days, all with their own "xx:xx:xx" to "xx:xx:xx" I want to pull that same time period I want to look at from each day, not just the first lot of "xx:xx:xx" to "xx:xx:xx" as it makes its way down the data, if that makes sense. Thanks!!

DYZ
  • 55,249
  • 10
  • 64
  • 93
spergy
  • 49
  • 2
  • 8
  • actually the date format is yyyy-mm-dd. I think this is what is making it not work properly. I've done parse_dates=True and index_col=0 as arguments for read_csv() but it's giving me a KeyError: 'date' when I try to print df.head(). – spergy Jan 18 '17 at 05:03
  • What I would really like to achieve is calculate the range(high - low) for each day on all my data within my desired time slot(07:00:00 - 08:00:00). can anyone help? Thanks! – spergy Jan 18 '17 at 07:29

1 Answers1

2

Consider the dataframe df

from pandas_datareader import data

df = data.get_data_yahoo('AAPL', start='2016-08-01', end='2016-08-03')
df = df.asfreq('H').ffill()

option 1
convert index to series then dt.hour.isin

slc = df.index.to_series().dt.hour.isin([9, 10])
df.loc[slc]

option 2
numpy broadcasting

slc = (df.index.hour[:, None] == [9, 10]).any(1)
df.loc[slc]

enter image description here


response to comment

To then get a range within that time slot per day, use resample + agg + np.ptp (peak to peak)

df.loc[slc].resample('D').agg(np.ptp)
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • What I would really like to achieve is calculate the range(high - low) for each day on all my data within my desired time slot(07:00:00 - 08:00:00). can anyone help? Thanks! – spergy Jan 18 '17 at 07:29
  • Thanks! Yeah I know sorry, couldn't figure out how to edit my original post, just saw the edit option, doh! :( – spergy Jan 18 '17 at 09:00