Massive edit:
Ok, so I have a timeseries dataframe at the minute level. For sake of example, this dataframe is one year's worth of data. I am attempting to create an analytical model that will iterate through this data day-to-day.
The function will: 1) Slice a day's worth of data from the dataframe. 2) Create a 30 minute (of first 30 minutes of day) sub-slice of the daily slice. 3) Pass the data from both slices through the analytical part of the function. 4) Append to new dataframe. 5) Continue iterations until complete.
The dataframe is in the following format:
open_price high low close_price volume price
2015-01-06 14:31:00+00:00 46.3800 46.440 46.29 46.380 560221.0 46.380
2015-01-06 14:32:00+00:00 46.3800 46.400 46.30 46.390 52959.0 46.390
2015-01-06 14:33:00+00:00 46.3900 46.495 46.36 46.470 100100.0 46.470
2015-01-06 14:34:00+00:00 46.4751 46.580 46.41 46.575 85615.0 46.575
2015-01-06 14:35:00+00:00 46.5800 46.610 46.53 46.537 78175.0 46.537
It seems to me that pandas datetimeindex functionalities are the best way to go about this task, but I have no idea where to start.
(1) Seems like I could use the .rollforward functionality, starting with the df start date/time, and roll forward one day through each iteration.
(2) Use a df.loc[mask] to create the subslice.
I'm fairly certain I can figure it out after (2), but once again I'm not very familiar with timeseries analysis or pandas datetimeindex functionalities.
Final dataframe:
high low retrace time
2015-01-06 46.440 46.29 True 47
2015-01-07 46.400 46.30 True 138
2015-01-08 46.495 46.36 False NaN
2015-01-09 46.580 46.41 True 95
2015-01-10 46.610 46.53 False NaN
High = High of first 30 minutes of day
Low = Low of first 30 minutes of day
Retrace = Boolean, if price returned to the open price at some point during the day after the first 30 minutes.
Time = The amount of time (minutes) it took to retrace.
Here's my code that seems to work (thanks all for your help!):
sample = msft_prices.ix[s_date:e_date]
sample = sample.resample('D').mean()
sample = sample.dropna()
sample = sample.index.strftime('%Y-%m-%d')
ORTDF = pd.DataFrame()
ORDF = pd.DataFrame()
list1 = []
list2 = []
def hi_lo(prices):
for i in sample:
list1 = []
if i in prices.index:
ORTDF = prices[i+' 14:30':i+' 15:00']
ORH = max(ORTDF['high']) #integer value
ORHK = ORTDF['high'].idxmax()
ORL = min(ORTDF['low']) #integer value
ORLK = ORTDF['low'].idxmin()
list1.append(ORH)
list1.append(ORL)
if ORHK < ORLK:
dailydf = prices[i+' 14:30':i+' 21:00']
if max(dailydf['high']) > ORH:
ORDH = max(dailydf['high'])
ORDHK = dailydf['high'].idxmax()
touched = 1
time_to_touch = ORDHK - ORHK
time_to_touch = time_to_touch.total_seconds() / 60
list1.append(touched)
list1.append(time_to_touch)
list2.append(list1)
else:
touched = 0
list1.append(touched)
list1.append('NaN')
list2.append(list1)
elif ORHK > ORLK:
dailydf = prices[i+' 14:30':i+' 21:00']
if min(dailydf['low']) < ORL:
ORDL = min(dailydf['low'])
ORDLK = dailydf['low'].idxmin()
touched = 1
time_to_touch = ORDLK - ORLK
time_to_touch = time_to_touch.total_seconds() / 60
list1.append(touched)
list1.append(time_to_touch)
list2.append(list1)
else:
touched = 0
list1.append(touched)
list1.append('NaN')
list2.append(list1)
else:
pass
ORDF = pd.DataFrame(list2, columns=['High', 'Low', 'Retraced', 'Time']).set_index([sample])
return ORDF
This probably isn't the most elegant way to go about it, but hey, it works!