-1

I have hundreds of columns in a DataFrame and would like to drop rows where multiple columns are NaN. Meaning entire row is NaN for those columns.

I have tried to slice columns but the code is taking forever to run.

df = df.drop(df[(df.loc[:,'col1':'col100'].isna()) & (df.loc[:,'col120':'col220'].isna())].index)

Appreciate any help.

M A
  • 73
  • 1
  • 2
  • 10

2 Answers2

1

Part of your original question reads: "... would like to drop rows where multiple columns are NaN. Meaning entire row is NaN for those columns. "

Can I interpret this as, you want to delete the row when the entire row has NaNs. If that is true you should be able to achive this by:

df.dropna(axis = 'rows', how = 'all', inplace = True)

If that is not the case then I misunderstood your question.

theDBA
  • 239
  • 1
  • 5
0

You should try to use the dropna() function with the subset parameter equal to the columns you are trying to drop on. Here is a short example taken from Pandas' documentation

df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
                   "toy": [np.nan, 'Batmobile', 'Bullwhip'],
                   "born": [pd.NaT, pd.Timestamp("1940-04-25"),
                            pd.NaT]})

df
       name        toy       born
0    Alfred        NaN        NaT
1    Batman  Batmobile 1940-04-25
2  Catwoman   Bullwhip        NaT

df.dropna(subset=['name', 'born'])

This gives you the following:

       name        toy       born
1    Batman  Batmobile 1940-04-25
Sabri B
  • 51
  • 3
  • I wanted to avoid this as I have quite a major list of columns to check. Some 200 columns must be checked. – M A Mar 27 '20 at 15:53
  • Also I don't want to drop individual NA values, but where the entire row has NA – M A Mar 27 '20 at 16:30
  • Hi ! You could create a list of column names such that : `col_names=df.loc[:,'col1':'col100'].columns + df.loc[:,'col120':'col220'].columns` and then apply the dropna above. As you can see in the above example, using the `subset` will by default drop any row where at least one column is NaN. Yet you can choose to drop only if all columns are NaN by using `how='all`. Hope this will help you – Sabri B Mar 30 '20 at 14:07