5

I'd like to find the start and stop points of a column and flag them as shown below:

value flag
NaN NaN
NaN NaN
1 start
2 NaN
1 NaN
3 NaN
2 stop
NaN NaN
1 start
2 stop
tdy
  • 36,675
  • 19
  • 86
  • 83
kswizzle101_
  • 99
  • 1
  • 4

3 Answers3

5
  • start occurs when the current value is notnull and the previous value isnull
  • stop occurs when the current value is notnull and the next value isnull

Generate these conditions using shift and assign using loc:

start = df.value.notnull() & df.value.shift().isnull()
stop = df.value.notnull() & df.value.shift(-1).isnull()

df.loc[start, 'flag'] = 'start'
df.loc[stop, 'flag'] = 'stop'

#    value   flag
# 0    NaN    NaN
# 1    NaN    NaN
# 2    1.0  start
# 3    2.0    NaN
# 4    1.0    NaN
# 5    3.0    NaN
# 6    2.0   stop
# 7    NaN    NaN
# 8    1.0  start
# 9    2.0   stop

Alternatively assign using mask:

df['flag'] = df['flag'].mask(start, 'start')
df['flag'] = df['flag'].mask(stop, 'stop')
tdy
  • 36,675
  • 19
  • 86
  • 83
  • 1
    Related: [How to detect 3 NaN + 1 non-NaN in sequence](https://stackoverflow.com/a/70223106/13138364) – tdy Dec 04 '21 at 08:37
  • Hi, could you recommend a method to only tag if the non-nulls are longer than a certain length? I.e. there needs to be 10 consecutive numbers for it to tag start and stop. I can do it using multiple shifts but it makes it quite long and confusing! Thanks – kswizzle101_ Jul 28 '22 at 10:12
1

Here I iterated through the rows and used a flag to indicate if we were starting or not.

start_flag = 0
for index, row in df.iterrows():
  if row['val'].isnull():
    df.loc[index, 'flag'] = "NaN"
    start_flag = 0
  else:
    if start_flag == 0:
      df.loc[index, 'flag'] = "start"
      start_flag = 1
    else:
      if (index < df.shape[0]-1 and df.loc[index+1, 'val'].isnull()) or index == df.shape[0]-1:
         df.loc[index, 'flag'] = "stop"
Zen
  • 115
  • 7
  • 1
    please do not use iteration unless you have no other options. This would be a bad idea to iterate through all the rows. It is costly and slow – Joe Ferndz Nov 25 '21 at 03:58
1

Here's what you need:

# Auxiliar columns to detect start and end
df['Past'] = df['Value'].shift(-1)
df['Future'] = df['Value'].shift(1)

# Auxiliar function to complete new column
def Find_start_stop_Null(row):
    flag = np.nan
    if ((not pd.isnull(row['Value'])) and (pd.isnull(row['Future']))):
        flag = 'start'
    elif ((not pd.isnull(row['Value'])) and (pd.isnull(row['Past']))):
        flag = 'stop'
    return flag

df['flag'] = df.apply(lambda row: Find_start_stop_Null(row), axis=1)
# Drop unnecessary columns
df = df.drop('Past', axis=1)
df = df.drop('Future', axis=1)
Guillermo Garcia
  • 528
  • 4
  • 11