Remove Row if NaN in First Five Columns

Question

I have a pandas dataframe with dimensions 89 rows by 13 columns. I want to remove an entire row if NaN appears within the first five columns. Here is an example.

LotName     C15   C16  C17  C18  C19 Spots15 Spots16 ...
Cherry St   439   464  555  239  420     101     101 ...
Springhurst NaN   NaN  NaN  NaN  NaN      12      12
Barton Lot   34    24   43   45   39      10       9 ...

In the above example, I would want to remove the Springhurst observation, as it contains NaN within the first five columns. How would I be able to do this in Python?

Mayank Porwal · Accepted Answer · 2020-10-25T20:00:28.423

7

If you want to do a strict check of Nan in all rows for first 5 columns:

df.iloc[:, :5].dropna(how='all')

Explanation:

df.iloc[:, :5] : select all rows and first 5 columns

.dropna(how='all') : check if all values in a row are NaN

If you want to check for Nan in any of the 5 columns:

df.iloc[:, :5].dropna(how='any')

For assigning it back to original df, you can do this:

In [2107]: ix = df.iloc[:, :5].dropna(how='all').index.tolist()

In [2110]: df = df.loc[ix]

In [2111]: df
Out[2111]: 
       LotName    C15    C16    C17    C18  C19  Spots15  Spots16
Cherry      St  439.0  464.0  555.0  239.0  420      101    101.0
Barton     Lot   34.0   24.0   43.0   45.0   39       10      9.0

edited Oct 25 '20 at 20:00

answered Oct 25 '20 at 19:48

Mayank Porwal

33,470
8
37
58

Will this statement remove the rows from the original dataframe? – Bill Oct 25 '20 at 19:52
1

@Bill I've updated my answer to remove rows from original df. Please have a look. – Mayank Porwal Oct 25 '20 at 19:57

score 3 · Answer 2 · answered Oct 25 '20 at 19:48

You can use iloc to select your columns, notna() for not NaN, and any to check if any of the values in selected columns/rows is True

mask = df.iloc[:,:5].notna().any(axis=1)
df[mask]

Output:

              C15    C16    C17    C18    C19  Spots15 Spots16 ...
LotName                                                           
Cherry St   439.0  464.0  555.0  239.0  420.0      101     101 ...
Barton Lot   34.0   24.0   43.0   45.0   39.0       10       9 ...

score 2 · Answer 3 · answered Oct 25 '20 at 19:51

Another solution: here you can specify the columns, from C15 to C19 and then filter-out all rows which have any NaN inside:

print( df[~df.loc[:, 'C15':'C19'].isna().any(axis=1)] )

Prints:

      LotName    C15    C16    C17    C18    C19  Spots15  Spots16
0   Cherry St  439.0  464.0  555.0  239.0  420.0      101      101
2  Barton Lot   34.0   24.0   43.0   45.0   39.0       10        9

Remove Row if NaN in First Five Columns

3 Answers3

For assigning it back to original df, you can do this:

Linked