1

I have the following snippet from an example dataframe:

df = pd.DataFrame({'location': ['Seattle', np.nan, 'Portland', 'San Francisco'],
                   'time': ['2022-06-01 12:00:00', '2022-06-01 13:00:00', '2022-06-01 14:00:00', '2022-06-01 15:00:00']})

I would like to retrieve the rows where location = nan as well as the non-nan row above and below.

So that it will be as such

df = pd.DataFrame({'location': ['Seattle', np.nan, 'Portland'], 'time': ['2022-06-01 12:00:00', '2022-06-01 13:00:00', '2022-06-01 14:00:00']})

How can I achieve this? The dataframe is larger than the example snippet with different cases. But generally is should be: Retrieve all rows with NaN plus the next non-nan row above or below.

mabiel
  • 95
  • 6

3 Answers3

2

I would use a centered rolling to generate the mask for boolean indexing:

N = 1
m = (df['location'].isna()
      .rolling(2*N+1, min_periods=1, center=True)
      .max().eq(1)
     )

out = df.loc[m]

You can easily generalize to any number of rows before/after by changing N.

Output:

   location                 time
0   Seattle  2022-06-01 12:00:00
1       NaN  2022-06-01 13:00:00
2  Portland  2022-06-01 14:00:00
mozway
  • 194,879
  • 13
  • 39
  • 75
1

Use boolean indexing with chain mass by | for bitwise OR:

m = df['location'].isna()

df = df[m.shift(fill_value=False) | m.shift(-1, fill_value=False) | m]
print (df)
   location                 time
0   Seattle  2022-06-01 12:00:00
1       NaN  2022-06-01 13:00:00
2  Portland  2022-06-01 14:00:00
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Here is a way using ffill() and bfill()

n = 1
s = df['location'].isna()
df.loc[s.where(s).ffill(limit=n).bfill(limit=n).fillna(False)]

Output:

   location                 time
0   Seattle  2022-06-01 12:00:00
1       NaN  2022-06-01 13:00:00
2  Portland  2022-06-01 14:00:00
rhug123
  • 7,893
  • 1
  • 9
  • 24