I'd like to find the start and stop points of a column and flag them as shown below:
value | flag |
---|---|
NaN | NaN |
NaN | NaN |
1 | start |
2 | NaN |
1 | NaN |
3 | NaN |
2 | stop |
NaN | NaN |
1 | start |
2 | stop |
I'd like to find the start and stop points of a column and flag them as shown below:
value | flag |
---|---|
NaN | NaN |
NaN | NaN |
1 | start |
2 | NaN |
1 | NaN |
3 | NaN |
2 | stop |
NaN | NaN |
1 | start |
2 | stop |
start
occurs when the current value is notnull
and the previous value isnull
stop
occurs when the current value is notnull
and the next value isnull
Generate these conditions using shift
and assign using loc
:
start = df.value.notnull() & df.value.shift().isnull()
stop = df.value.notnull() & df.value.shift(-1).isnull()
df.loc[start, 'flag'] = 'start'
df.loc[stop, 'flag'] = 'stop'
# value flag
# 0 NaN NaN
# 1 NaN NaN
# 2 1.0 start
# 3 2.0 NaN
# 4 1.0 NaN
# 5 3.0 NaN
# 6 2.0 stop
# 7 NaN NaN
# 8 1.0 start
# 9 2.0 stop
Alternatively assign using mask
:
df['flag'] = df['flag'].mask(start, 'start')
df['flag'] = df['flag'].mask(stop, 'stop')
Here I iterated through the rows and used a flag to indicate if we were starting or not.
start_flag = 0
for index, row in df.iterrows():
if row['val'].isnull():
df.loc[index, 'flag'] = "NaN"
start_flag = 0
else:
if start_flag == 0:
df.loc[index, 'flag'] = "start"
start_flag = 1
else:
if (index < df.shape[0]-1 and df.loc[index+1, 'val'].isnull()) or index == df.shape[0]-1:
df.loc[index, 'flag'] = "stop"
Here's what you need:
# Auxiliar columns to detect start and end
df['Past'] = df['Value'].shift(-1)
df['Future'] = df['Value'].shift(1)
# Auxiliar function to complete new column
def Find_start_stop_Null(row):
flag = np.nan
if ((not pd.isnull(row['Value'])) and (pd.isnull(row['Future']))):
flag = 'start'
elif ((not pd.isnull(row['Value'])) and (pd.isnull(row['Past']))):
flag = 'stop'
return flag
df['flag'] = df.apply(lambda row: Find_start_stop_Null(row), axis=1)
# Drop unnecessary columns
df = df.drop('Past', axis=1)
df = df.drop('Future', axis=1)