pandas find start and stop point of non-null values

Question

I'd like to find the start and stop points of a column and flag them as shown below:

value	flag
NaN	NaN
NaN	NaN
1	start
2	NaN
1	NaN
3	NaN
2	stop
NaN	NaN
1	start
2	stop

tdy · Accepted Answer · 2021-11-25T03:27:28.120

5

start occurs when the current value is notnull and the previous value isnull
stop occurs when the current value is notnull and the next value isnull

Generate these conditions using shift and assign using loc:

start = df.value.notnull() & df.value.shift().isnull()
stop = df.value.notnull() & df.value.shift(-1).isnull()

df.loc[start, 'flag'] = 'start'
df.loc[stop, 'flag'] = 'stop'

#    value   flag
# 0    NaN    NaN
# 1    NaN    NaN
# 2    1.0  start
# 3    2.0    NaN
# 4    1.0    NaN
# 5    3.0    NaN
# 6    2.0   stop
# 7    NaN    NaN
# 8    1.0  start
# 9    2.0   stop

Alternatively assign using mask:

df['flag'] = df['flag'].mask(start, 'start')
df['flag'] = df['flag'].mask(stop, 'stop')

edited Nov 25 '21 at 03:27

answered Nov 25 '21 at 03:22

tdy

36,675
19
86
83

1

Related: [How to detect 3 NaN + 1 non-NaN in sequence](https://stackoverflow.com/a/70223106/13138364) – tdy Dec 04 '21 at 08:37
Hi, could you recommend a method to only tag if the non-nulls are longer than a certain length? I.e. there needs to be 10 consecutive numbers for it to tag start and stop. I can do it using multiple shifts but it makes it quite long and confusing! Thanks – kswizzle101_ Jul 28 '22 at 10:12

score 1 · Answer 2 · answered Nov 24 '21 at 17:22

1

Here I iterated through the rows and used a flag to indicate if we were starting or not.

start_flag = 0
for index, row in df.iterrows():
  if row['val'].isnull():
    df.loc[index, 'flag'] = "NaN"
    start_flag = 0
  else:
    if start_flag == 0:
      df.loc[index, 'flag'] = "start"
      start_flag = 1
    else:
      if (index < df.shape[0]-1 and df.loc[index+1, 'val'].isnull()) or index == df.shape[0]-1:
         df.loc[index, 'flag'] = "stop"

answered Nov 24 '21 at 17:22

Zen

115
7

1

please do not use iteration unless you have no other options. This would be a bad idea to iterate through all the rows. It is costly and slow – Joe Ferndz Nov 25 '21 at 03:58

score 1 · Answer 3 · answered Nov 24 '21 at 17:23

Here's what you need:

# Auxiliar columns to detect start and end
df['Past'] = df['Value'].shift(-1)
df['Future'] = df['Value'].shift(1)

# Auxiliar function to complete new column
def Find_start_stop_Null(row):
    flag = np.nan
    if ((not pd.isnull(row['Value'])) and (pd.isnull(row['Future']))):
        flag = 'start'
    elif ((not pd.isnull(row['Value'])) and (pd.isnull(row['Past']))):
        flag = 'stop'
    return flag

df['flag'] = df.apply(lambda row: Find_start_stop_Null(row), axis=1)
# Drop unnecessary columns
df = df.drop('Past', axis=1)
df = df.drop('Future', axis=1)

pandas find start and stop point of non-null values

3 Answers3