Get row with NaN as well as preceding and following NaN row

Question

I have the following snippet from an example dataframe:

df = pd.DataFrame({'location': ['Seattle', np.nan, 'Portland', 'San Francisco'],
                   'time': ['2022-06-01 12:00:00', '2022-06-01 13:00:00', '2022-06-01 14:00:00', '2022-06-01 15:00:00']})

I would like to retrieve the rows where location = nan as well as the non-nan row above and below.

So that it will be as such

df = pd.DataFrame({'location': ['Seattle', np.nan, 'Portland'], 'time': ['2022-06-01 12:00:00', '2022-06-01 13:00:00', '2022-06-01 14:00:00']})

How can I achieve this? The dataframe is larger than the example snippet with different cases. But generally is should be: Retrieve all rows with NaN plus the next non-nan row above or below.

score 2 · Answer 1 · answered Jun 16 '23 at 10:33

I would use a centered rolling to generate the mask for boolean indexing:

N = 1
m = (df['location'].isna()
      .rolling(2*N+1, min_periods=1, center=True)
      .max().eq(1)
     )

out = df.loc[m]

You can easily generalize to any number of rows before/after by changing N.

Output:

   location                 time
0   Seattle  2022-06-01 12:00:00
1       NaN  2022-06-01 13:00:00
2  Portland  2022-06-01 14:00:00

score 1 · Accepted Answer · answered Jun 16 '23 at 10:32

Use boolean indexing with chain mass by | for bitwise OR:

m = df['location'].isna()

df = df[m.shift(fill_value=False) | m.shift(-1, fill_value=False) | m]
print (df)
   location                 time
0   Seattle  2022-06-01 12:00:00
1       NaN  2022-06-01 13:00:00
2  Portland  2022-06-01 14:00:00

score 0 · Answer 3 · answered Jun 18 '23 at 17:04

Here is a way using ffill() and bfill()

n = 1
s = df['location'].isna()
df.loc[s.where(s).ffill(limit=n).bfill(limit=n).fillna(False)]

Output:

   location                 time
0   Seattle  2022-06-01 12:00:00
1       NaN  2022-06-01 13:00:00
2  Portland  2022-06-01 14:00:00

Get row with NaN as well as preceding and following NaN row

3 Answers3