3

Massive edit:

Ok, so I have a timeseries dataframe at the minute level. For sake of example, this dataframe is one year's worth of data. I am attempting to create an analytical model that will iterate through this data day-to-day.

The function will: 1) Slice a day's worth of data from the dataframe. 2) Create a 30 minute (of first 30 minutes of day) sub-slice of the daily slice. 3) Pass the data from both slices through the analytical part of the function. 4) Append to new dataframe. 5) Continue iterations until complete.

The dataframe is in the following format:

                           open_price high  low   close_price volume     price
2015-01-06 14:31:00+00:00   46.3800 46.440  46.29   46.380  560221.0    46.380
2015-01-06 14:32:00+00:00   46.3800 46.400  46.30   46.390  52959.0     46.390
2015-01-06 14:33:00+00:00   46.3900 46.495  46.36   46.470  100100.0    46.470
2015-01-06 14:34:00+00:00   46.4751 46.580  46.41   46.575  85615.0     46.575
2015-01-06 14:35:00+00:00   46.5800 46.610  46.53   46.537  78175.0     46.537

It seems to me that pandas datetimeindex functionalities are the best way to go about this task, but I have no idea where to start.

(1) Seems like I could use the .rollforward functionality, starting with the df start date/time, and roll forward one day through each iteration.

(2) Use a df.loc[mask] to create the subslice.

I'm fairly certain I can figure it out after (2), but once again I'm not very familiar with timeseries analysis or pandas datetimeindex functionalities.

Final dataframe:

              high     low   retrace  time
2015-01-06    46.440  46.29  True     47
2015-01-07    46.400  46.30  True     138
2015-01-08    46.495  46.36  False    NaN
2015-01-09    46.580  46.41  True     95
2015-01-10    46.610  46.53  False    NaN

High = High of first 30 minutes of day

Low = Low of first 30 minutes of day

Retrace = Boolean, if price returned to the open price at some point during the day after the first 30 minutes.

Time = The amount of time (minutes) it took to retrace.

Here's my code that seems to work (thanks all for your help!):

sample = msft_prices.ix[s_date:e_date]
sample = sample.resample('D').mean() 
sample = sample.dropna()
sample = sample.index.strftime('%Y-%m-%d')
ORTDF = pd.DataFrame()
ORDF = pd.DataFrame()
list1 = []
list2 = []
def hi_lo(prices):

        for i in sample:
            list1 = []
            if i in prices.index:

                ORTDF = prices[i+' 14:30':i+' 15:00']
                ORH = max(ORTDF['high']) #integer value
                ORHK = ORTDF['high'].idxmax()
                ORL = min(ORTDF['low']) #integer value
                ORLK = ORTDF['low'].idxmin()
                list1.append(ORH)
                list1.append(ORL)



                if ORHK < ORLK:
                    dailydf = prices[i+' 14:30':i+' 21:00']
                    if max(dailydf['high']) > ORH:
                        ORDH = max(dailydf['high'])
                        ORDHK = dailydf['high'].idxmax()
                        touched = 1
                        time_to_touch = ORDHK - ORHK
                        time_to_touch = time_to_touch.total_seconds() / 60
                        list1.append(touched)
                        list1.append(time_to_touch)
                        list2.append(list1)
                    else:
                        touched = 0
                        list1.append(touched)
                        list1.append('NaN')
                        list2.append(list1)
                elif ORHK > ORLK:
                    dailydf = prices[i+' 14:30':i+' 21:00']
                    if min(dailydf['low']) < ORL:
                        ORDL = min(dailydf['low'])
                        ORDLK = dailydf['low'].idxmin()
                        touched = 1
                        time_to_touch = ORDLK - ORLK
                        time_to_touch = time_to_touch.total_seconds() / 60
                        list1.append(touched)
                        list1.append(time_to_touch)
                        list2.append(list1)
                    else:
                        touched = 0
                        list1.append(touched)
                        list1.append('NaN')
                        list2.append(list1)


            else:
                pass


        ORDF = pd.DataFrame(list2, columns=['High', 'Low', 'Retraced', 'Time']).set_index([sample])
        return ORDF

This probably isn't the most elegant way to go about it, but hey, it works!

supernoob
  • 55
  • 1
  • 7
  • 2
    Tip: don't use jargon! Most people here won't understand financial terminology. It's not really clear what your problem is. – Will Vousden Oct 24 '16 at 20:04
  • 3
    You would be much better served by provided a minimal, complete, and verified example. http://stackoverflow.com/help/mcve – piRSquared Oct 24 '16 at 20:12
  • Hmm...should I restart from scratch? – supernoob Oct 24 '16 at 20:16
  • To clarify: 1) I have one minute timeseries data for one or more years. 2) I wish to slice by date, then slice by 30 minutes. 3) Analyze data in 30 minutes 4) Extract analysis data and append to new df. I'm trying to find examples of how to use pandas date functionalities to do this as my attempts at using simple indexing are falling flat. – supernoob Oct 24 '16 at 20:16
  • @supernoob: Edit your question to include a sample dataframe (just a few rows would do) and your expected output. (Also, please just remove the financial terminology, it's not relevant for what you're asking here). You definitely want to have an index/column with a Datatime type and then select based on that, not on the index numbers. – cd98 Oct 24 '16 at 21:17
  • @cd98: Ok, I've updated the question... hopefully it's clearer now. – supernoob Oct 24 '16 at 21:40
  • @WillVousden updated – supernoob Oct 24 '16 at 21:46
  • @piRSquared updated – supernoob Oct 24 '16 at 21:46
  • @supernoob Much clearer now. I attempted an answer, but without you stating your expected output, I'm not sure if I completely got it – cd98 Oct 24 '16 at 22:04
  • @WillVousden updated question... hopefully it's clearer now. – supernoob Oct 25 '16 at 18:47

1 Answers1

2

Read the docs for general reference

Setup (next time please provide this yourself in the question!):

dates = pd.to_datetime(['19 November 2010 9:01', '19 November 2010 9:02', '19 November 2010 9:03',
                       '20 November 2010 9:05', '20 November 2010 9:06', '20 November 2010 9:07'])
df = pd.DataFrame({'low_price': [1.2, 1.8, 1.21, 2., 4., 1.201],  
                  'high_price': [3., 1.8, 1.21, 4., 4.01, 1.201]}, index=dates)
df

                    high_price  low_price
2010-11-19 09:01:00     3.000   1.200
2010-11-19 09:02:00     1.800   1.800
2010-11-19 09:03:00     1.210   1.210
2010-11-20 09:05:00     4.000   2.000
2010-11-20 09:06:00     4.010   4.000
2010-11-20 09:07:00     1.201   1.201

I'll group by Day and then for each day apply a function that computes whether there was a retrace and the time period when it happened. Your question wasn't clear on which column to operate or what is the tolerance level to say "prices are the same", so I put them as options

def retrace_per_day(day, col='high_price', epsilon=0.5):
    """take day data and returns whether there was a retrace.
    If yes, return 1 and the minute in which it did.
    Otherwise return 0 and np.nan"""
    cond = (np.abs(day[col] - day[col][0]) < epsilon)
    cond_index = cond[cond].index
    if len(cond_index) > 1:
        retrace, period = 1, cond_index[1]
    else:
        retrace, period = 0, np.nan
    return pd.Series({'retrace': retrace, 'period' : period})

df.groupby(pd.TimeGrouper('1D')).apply(retrace_per_day)

           period   retrace
2010-11-19  NaN     0.0
2010-11-20  2010-11-20 09:06:00     1.0

You can then use this to merge back into your original dataframe if needed.

cd98
  • 3,442
  • 2
  • 35
  • 51
  • Your example is great, but I need to return values, not a dataframe unfortunately (if my approach is right). To be very specific, I need to find the High and Low within the first 30 minutes, the sequence it occurred in, then find out if price retraced from the start of the day. So the function would look like this: Identify high & low within first 30 minutes of day, append both to list1. Identify the sequence, if price retraced return 'true' or 'false', append to list1. Find time delta between start of day and retracement. Append to list1. Append list1 to list2. Begin new iteration. – supernoob Oct 24 '16 at 22:40
  • @supenoob: I have no idea what "retrace" means in this context. Can you please put the (simplified) expected output on your question? We don't need to understand all the steps, just what parts you need to combine. You probably don't need to do this append and separate into lists.. It seems to me you just need to create a new variable called: "Retracement" that will be one, say, at the minute each day when the price started to "retrace" (whatever that means!) and zero otherwise. Is that it? This is achievable by a `groupby`-`apply` combination – cd98 Oct 24 '16 at 22:45
  • When all iterations are done, build dataframe from list2. – supernoob Oct 24 '16 at 22:46
  • 'Retracement' in this context means that price returns to the opening price at the beginning of the day. So, if the stock opened at $50, dropped a bit but returned back to $50 at some point in the day then it did a full retracement. I need to identify the time delta between those two points. I'll look into the groupby - apply combo. – supernoob Oct 24 '16 at 22:50
  • @supernoob: So what is the end result? Per day a variable that says how many hours/minutes it took for the stock to "retrace"? (Day 1: 21min, Day 2: 5min, etc.) If that's it, I think we finally reached the specific question :) Please edit your question if that's the main point of your question and I think I can probably answer it. – cd98 Oct 25 '16 at 01:30
  • Updated the question with the desired final df. – supernoob Oct 25 '16 at 17:31
  • That made me understand it. I still don't understand many parts of your question, but the expected output made me (mostly) figure it out. For the future, please try being clearer in the question and expected output. After that you can talk about what you've tried doing and extra stuff. – cd98 Oct 26 '16 at 02:59
  • Thanks for the suggestion! I finally got it to work, it's probably not as elegant as yours, but it gives me what I need. Thanks for hanging in there and providing solutions, much appreciated. – supernoob Oct 26 '16 at 16:27