1

I'm trying to find the fastest solution to iterate over each row of Open High Low Close data and count out how many rows exist between the current row's high and the next high that’s greater than or equal to the original. Here is a shortened code example which I think explains it well (I'd be trying to find the final value of _next):

for i in df.index:
    while df.loc[i, 'high'] > df.loc[_next, 'high']:
         _next += 1

I think I need to use a lambda function inside apply but I can't seem to get it right. Been trying to figure this out for days and would love any advice!

Using this data, the function should return [2, 0, 0, 3, 1, 0, 0]

d = {'high': [1.2, 1.1, 1.1, 1.4, 1.3, 1.2, 1.3]}
df = pd.DataFrame(data=d)


    high   rows_to_higher_high
0   1.2    2
1   1.1    0
2   1.1    0
3   1.4    3
4   1.3    1
5   1.2    0
6   1.3    0

Edit: itertuples(name=None, index=True) is the fastest way to do it so far.

Rudolph
  • 13
  • 4

2 Answers2

1

using list comprehension and get index to list

In [166]: d = {'high': [1.2, 1.1, 1.1, 1.4, 1.3, 1.2, 1.3]}

In [167]: df = pd.DataFrame(data=d)

In [168]: df['rows_to_higher_high']=[(df['high'].values[i+1:]>=val).tolist().index(True) if True in (df['high'].values[i+1:]>=val).tolist() else le
     ...: n((df['high'].values[i+1:]>=val).tolist())  for i, val in enumerate(df['high'].values)]

In [169]: df
Out[169]:
   high  rows_to_higher_high
0   1.2                    2
1   1.1                    0
2   1.1                    0
3   1.4                    3
4   1.3                    1
5   1.2                    0
6   1.3                    0
Akhilesh_IN
  • 1,217
  • 1
  • 13
  • 19
  • Thank you Akhilesh, that worked. Unfortunately it doesn't seem to be a huge improvement over the simple loop I posted in the first example – Rudolph Jul 20 '20 at 02:36
0

Base on my understanding, we can get the subtraction from current row by all value, then get the upper tri , and get the index max with argmax when the value is more than 0

Sample :

s
0    1
1    2
2    1
3    3
4    1
5    1
6    5
dtype: int64

Solution :

np.clip(np.triu((s.values-s.values[:,None])>0,0).argmax(1)-np.arange(len(s)),a_min=0, a_max=None)
array([1, 2, 1, 3, 2, 1, 0])
BENY
  • 317,841
  • 20
  • 164
  • 234
  • That is mind numbing, however not what I'm trying to do. I've made an addendum to the post to add in data. – Rudolph Jul 18 '20 at 04:01