2

(This question can probably be generalized to filtering any Boolean Pandas series, but nothing that I can find on that subject addresses my issue.)

Given this dataframe:

df = pd.DataFrame({'a': (1, None, 3), 'b': (4, 5, 6), 'c': (7, 8, None), 'd': (10, 11, 12)})
df
     a  b    c   d
0  1.0  4  7.0  10
1  NaN  5  8.0  11
2  3.0  6  NaN  12

I need to get a list of column names that have NaN values in them (my real dataset has 80+ columns and for cleaning purposes I only want to focus on anything with NaN for the time being). This will give me a full Boolean list:

df.isnull().any()
a     True
b    False
c     True
d    False
dtype: bool

Ideally I only want:

a     True
c     True

I cannot figure out how to do that. A mask is close, but is applied to the row:

mask = df.isnull().values
df[mask]
     a  b    c   d
1  NaN  5  8.0  11
2  3.0  6  NaN  12

Is there a way to apply them to the column axis instead, or is there a better way to do what I'm looking for?

jml
  • 137
  • 1
  • 10
  • Like the dupe says, `df.isna().any().loc[lambda x: x]` is the shortest way for your specific output – cs95 Jan 05 '20 at 20:18
  • Try formulating the problem from the POV of what you _want_ to do ("I want to find columns that have...") as opposed to what you're trying to do that isn't working. Just a tip. Cheers – cs95 Jan 05 '20 at 20:27

1 Answers1

4

You can perform indexing on the columns with your mask:

>>> df.columns[df.isnull().any()]
Index(['a', 'c'], dtype='object')

Or if you want to show the data for the given columns:

>>> df[df.columns[df.isnull().any()]]
     a    c
0  1.0  7.0
1  NaN  8.0
2  3.0  NaN
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • 1
    I didn't realize you could index .columns like that, but I knew I had to be missing something straightforward. Good to know for the future. – jml Jan 05 '20 at 20:26