Function to modify row values using df.apply or similar in Pandas

Question

Overview

I'm working on a dataframe where df["Pivots"] alternates between 1 and -1 whenever a high or low has previously been identified by a zigzag indicator.

I am trying to implement the following with Pandas on a dataframe and modify the relevant rows when df["Pivots"] has previously (incorrectly) been assigned a value of 1 which marks a high but another row actually has a higher High value.

Please see the screenshots below for a visual representation of the data and the desired output.

Pseudocode

If the current row has a -1 in df["Pivots"]
rows_between = index < current row and index > last pivot value in df["Pivots"] which will be a 1
If df.High in rows_between > df["Pivot Price"] in the current row then actual_high is df[High].max() in rows_between.
Remove the 1 from df["Pivots"] & value from df["Pivot Price"] in current row and add it to df["Pivots"] & df["Pivot Price"] in row which is actual_high

Example

In this example df.High in rows 2023-10-08 is actual_high and higher than df["Pivot Price"] in row 2023-09-24.

This is the original dataframe.

This is the desired output:

Actual dataframe will contain many rows and this is just a Minimal, Reproducible Example.

Code

    df.to_dict()
    
   {'Open': {Timestamp('2023-09-24 00:00:00', freq='W-SUN'): 1.0427,
  Timestamp('2023-10-01 00:00:00', freq='W-SUN'): 1.0586,
  Timestamp('2023-10-08 00:00:00', freq='W-SUN'): 1.0314,
  Timestamp('2023-10-15 00:00:00', freq='W-SUN'): 1.0669,
  Timestamp('2023-10-22 00:00:00', freq='W-SUN'): 1.0058,
  Timestamp('2023-10-29 00:00:00', freq='W-SUN'): 0.9966},
 'High': {Timestamp('2023-09-24 00:00:00', freq='W-SUN'): 1.0621,
  Timestamp('2023-10-01 00:00:00', freq='W-SUN'): 1.0609,
  Timestamp('2023-10-08 00:00:00', freq='W-SUN'): 1.0714,
  Timestamp('2023-10-15 00:00:00', freq='W-SUN'): 1.0679,
  Timestamp('2023-10-22 00:00:00', freq='W-SUN'): 1.0198,
  Timestamp('2023-10-29 00:00:00', freq='W-SUN'): 0.9966},
 'Low': {Timestamp('2023-09-24 00:00:00', freq='W-SUN'): 1.0383,
  Timestamp('2023-10-01 00:00:00', freq='W-SUN'): 1.0297,
  Timestamp('2023-10-08 00:00:00', freq='W-SUN'): 1.0285,
  Timestamp('2023-10-15 00:00:00', freq='W-SUN'): 1.004,
  Timestamp('2023-10-22 00:00:00', freq='W-SUN'): 0.9941,
  Timestamp('2023-10-29 00:00:00', freq='W-SUN'): 0.938},
 'Close': {Timestamp('2023-09-24 00:00:00', freq='W-SUN'): 1.0577,
  Timestamp('2023-10-01 00:00:00', freq='W-SUN'): 1.0297,
  Timestamp('2023-10-08 00:00:00', freq='W-SUN'): 1.0666,
  Timestamp('2023-10-15 00:00:00', freq='W-SUN'): 1.0053,
  Timestamp('2023-10-22 00:00:00', freq='W-SUN'): 0.9988,
  Timestamp('2023-10-29 00:00:00', freq='W-SUN'): 0.9528},
 'Pivots': {Timestamp('2023-09-24 00:00:00', freq='W-SUN'): 1,
  Timestamp('2023-10-01 00:00:00', freq='W-SUN'): 0,
  Timestamp('2023-10-08 00:00:00', freq='W-SUN'): 0,
  Timestamp('2023-10-15 00:00:00', freq='W-SUN'): 0,
  Timestamp('2023-10-22 00:00:00', freq='W-SUN'): 0,
  Timestamp('2023-10-29 00:00:00', freq='W-SUN'): -1},
 'Pivot Price': {Timestamp('2023-09-24 00:00:00', freq='W-SUN'): 1.0621,
  Timestamp('2023-10-01 00:00:00', freq='W-SUN'): nan,
  Timestamp('2023-10-08 00:00:00', freq='W-SUN'): nan,
  Timestamp('2023-10-15 00:00:00', freq='W-SUN'): nan,
  Timestamp('2023-10-22 00:00:00', freq='W-SUN'): nan,
  Timestamp('2023-10-29 00:00:00', freq='W-SUN'): 0.938},
 'Date': {Timestamp('2023-09-24 00:00:00', freq='W-SUN'): Timestamp('2023-09-24 00:00:00'),
  Timestamp('2023-10-01 00:00:00', freq='W-SUN'): Timestamp('2023-10-01 00:00:00'),
  Timestamp('2023-10-08 00:00:00', freq='W-SUN'): Timestamp('2023-10-08 00:00:00'),
  Timestamp('2023-10-15 00:00:00', freq='W-SUN'): Timestamp('2023-10-15 00:00:00'),
  Timestamp('2023-10-22 00:00:00', freq='W-SUN'): Timestamp('2023-10-22 00:00:00'),
  Timestamp('2023-10-29 00:00:00', freq='W-SUN'): Timestamp('2023-10-29 00:00:00')}}

For reference, this is the code that generates these pivots.

So at any given time, `df['Pivots']` will only contain a single `1` and a single `-1`? Or have I misunderstood and there can be more? — tdy, Mar 27 '21 at 12:46

dataista · Accepted Answer · 2021-03-27T12:30:31.343

I cannot think of a short solution using .apply(), but with some auxiliary functions you could solve the problem with the following code:

import numpy as np

def get_highs_idx(df):
    return df[df['Pivots'] == 1].index.tolist()

def get_lows_idx(df):
    return df[df['Pivots'] == -1].index.tolist()

def get_previous_high_idx(df, low_idx):
    highs_idx = get_highs_idx(df)
    for high_idx in reversed(highs_idx):
        if high_idx < low_idx:
            return high_idx
    return None

def reset_pivot(df, old_high_idx, new_high_idx):
    df.loc[old_high_idx, 'Pivots'] = 0
    df.loc[old_high_idx, 'Pivot Price'] = np.nan
    df.loc[new_high_idx, 'Pivots'] = 1
    df.loc[new_high_idx, 'Pivot Price'] = df.loc[new_high_idx, 'High']
            
def correct_highs(df):
    lows_idx = get_lows_idx(df)
    for low_idx in lows_idx:
        high_idx = get_previous_high_idx(df, low_idx)
        if high_idx is not None:
            new_high_idx = df.loc[high_idx:low_idx, 'High'].idxmax()
            if high_idx != new_high_idx:
                reset_pivot(df, high_idx, new_high_idx)
            
correct_highs(df)

The code might be reduced a little, but I think this way is more legible.

Edit

Following your comment, I add below the code for correcting the lows too.


def get_previous_low_idx(df, high_idx):
    lows_idx= get_lows_idx(df)
    for low_idx in reversed(lows_idx):
        if low_idx < high_idx:
            return low_idx
    return None

def reset_low_pivot(df, old_low_idx, new_low_idx):
    df.loc[old_low_idx, 'Pivots'] = 0
    df.loc[old_low_idx, 'Pivot Price'] = np.nan
    df.loc[new_low_idx, 'Pivots'] = -1
    df.loc[new_low_idx, 'Pivot Price'] = df.loc[new_low_idx, 'Low']
            
def correct_lows(df):
    highs_idx = get_highs_idx(df)
    for high_idx in highs_idx:
        low_idx = get_previous_low_idx(df, high_idx)
        if low_idx is not None:
            new_low_idx = df.loc[low_idx:high_idx, 'Low'].idxmin()
            if low_idx != new_low_idx:
                reset_low_pivot(df, low_idx, new_low_idx)


            
correct_lows(df)

I didn't want to affect the original answer, but you might want to rename reset_pivot to reset_high_pivot for consistency.

Also a high level function could be added:

def correct_pivots(df):
    correct_highs(df)
    correct_lows(df)

Can you please also show how you would modify any incorrect lows using this same approach with `correct_lows(df)` — nipy, Mar 27 '21 at 12:01
Just added an edition with the correct_lows(df) function. I thought there was a problem with the approach in general, but since the sequence is of alternating 1 and -1 (with zeros in between) the approach holds. — dataista, Mar 27 '21 at 12:31
Unfortunately this sometimes creates an error. This [dataframe](https://pastebin.com/raw/baentnLt) shows incorrect lows are removed. Is this something you can take a look at please? — nipy, Apr 06 '21 at 23:16
Hi @dataista is [this](https://pastebin.com/raw/baentnLt) something you can help with please? Should I start a new question? — nipy, Apr 10 '21 at 10:37
Hey nipy, I'm starting a new position tomorrow. I won't be able to see this soon, unfortunately. — dataista, Apr 12 '21 at 01:38
if you ever get a chance to look at this in the future would be appreciated. — nipy, May 05 '21 at 06:46

Function to modify row values using df.apply or similar in Pandas

1 Answers1

Edit