0

I am looking at OHLC data using Pandas and a little stuck on whether I can avoid a loop for what I need to achieve. Basically, for the 'high' column, I want to lookback an arbitrary number of rows, say 10, looking for situation where the previous high is within 0.01 => caveat, if high is 81.21, 82.2, 82.19, 81.2, for the last row (81.2) I would not include the first (81.21) as it's already encountered 82.19 so don't want to have an absolute gap of 0.02, hope that makes sense.

I'm struggling to work out whether to use rolling window, or just apply by row or if there are any good ideas that would suit that I'm completely missing? Below is the a 10 row version of the dataframe (df) and the end desired output (end_df), which would have a boolean column for range bound, a timestamp when the range started (minimum of 2 periods), the low and high within given range and a count of how many periods there are.

I realise it would make sense to filter only those rows where the diff from the prev row is <= 0.01, as that would correctly identify whether range bound by definition above, then could either rolling lookback or something else I guess?

I can get to a solution easily just not very speedy or elegant, thanks in advance if you have any thoughts, code below:

import pandas as pd
from numpy import NaN
from pandas import NaT

# Starting dataframe df
df_data = {'columns': ['ts', 'high', 'low'],
        'data': [['2021-11-11 21:50:00', 81.18, 81.16],
                 ['2021-11-11 21:51:00', 81.2, 81.17],
                 ['2021-11-11 21:52:00', 81.2, 81.19],
                 ['2021-11-11 21:53:00', 81.2, 81.2],
                 ['2021-11-11 21:54:00', 81.2, 81.17],
                 ['2021-11-11 21:55:00', 81.19, 81.12],
                 ['2021-11-11 21:56:00', 81.16, 81.15],
                 ['2021-11-11 21:57:00', 81.18, 81.16],
                 ['2021-11-11 21:58:00', 81.18, 81.16],
                 ['2021-11-11 21:59:00', 81.21, 81.16]]}

df = pd.DataFrame(data=df_data['data'], columns=df_data['columns'])
df = df.assign(ts=pd.to_datetime(df.ts))
df = df.set_index('ts')

# desired output, end_df
end_data = {'columns': ['ts', 'high', 'low', 'range_bound', 'range_start', 'range_periods', 'range_low', 'range_high'],
        'data': [['2021-11-11 21:50:00', 81.18, 81.16, False, NaT, 0, NaN, NaN],
                 ['2021-11-11 21:51:00', 81.2, 81.17, False, NaT, 0, NaN, NaN],
                 ['2021-11-11 21:52:00', 81.2, 81.19, True, '2021-11-11 21:51:00', 2, 81.17, 81.2],
                 ['2021-11-11 21:53:00', 81.2, 81.2, True, '2021-11-11 21:51:00', 3, 81.17, 81.2],
                 ['2021-11-11 21:54:00', 81.2, 81.17, True, '2021-11-11 21:51:00', 4, 81.17, 81.2],
                 ['2021-11-11 21:55:00', 81.19, 81.12, True, '2021-11-11 21:51:00', 5, 81.17, 81.2],
                 ['2021-11-11 21:56:00', 81.16, 81.15, False, NaT, 0, NaN, NaN],
                 ['2021-11-11 21:57:00', 81.18, 81.16, False, NaT, 0, NaN, NaN],
                 ['2021-11-11 21:58:00', 81.18, 81.16, True, '2021-11-11 21:57:00', 2, 81.16, 81.18],
                 ['2021-11-11 21:59:00', 81.21, 81.16, False, NaT, 0, NaN, NaN]]}

end_df = pd.DataFrame(data=end_data['data'], columns=end_data['columns'])
end_df = end_df.assign(ts=pd.to_datetime(end_df.ts), range_start=pd.to_datetime(end_df.range_start))
end_df = end_df.set_index('ts')
  • It's a little unclear what you are trying to accomplish with this data. Aren't there indicators like donchian price channels or similar that might help? If you are just trying to find tightly range bound trading, there may be other methods to help. – Jonathan Leon Dec 08 '21 at 03:18
  • I'm looking for signals downstream with this as price moves about following a range - just wanted to work out if this first step could be vectorized easily... – jacksonwoody Dec 08 '21 at 09:19

0 Answers0