3

Searched and tried several answers here on SO, but they are all for returning rows with NaN's. I'd like to return only the columns with NaN values. For example the following df. How can I select columns 'A' and 'LG'?

df = pd.DataFrame(
        {'H': ['a','b', 'c'],
         'A': [np.nan,'d', 'e'],
         'LG':['AR1', 'RO1', np.nan],
         })

print(df)

     A  H   LG
0  NaN  a  AR1
1    d  b  RO1
2    e  c  NaN
Zanshin
  • 1,262
  • 1
  • 14
  • 30

2 Answers2

4

I think you need first replace strings NaN to np.nan in sample:

df = pd.DataFrame(
        {'H': ['a','b', 'c'],
         'A': [np.nan,'d', 'e'],
         'LG':['AR1', 'RO1', np.nan],
         })

Then check by isnull and any:

mask = df.isnull().any()
print (mask)
A      True
H     False
LG     True
dtype: bool

Last use boolean indexing with index:

print (mask.index[mask])
Index(['A', 'LG'], dtype='object')

If need columns add loc:

print (df.loc[:, mask])
     A   LG
0  NaN  AR1
1    d  RO1
2    e  NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

This returns all columns containing NaN.

df = pd.DataFrame(
        {'H': ['a','b', 'c'],
         'A': [np.nan,'d', 'e'],
         'LG':['AR1', 'RO1', np.nan],
         })

x = pd.isnull(df).any()

print(df[x.index[x]])

     A   LG
0  NaN  AR1
1    d  RO1
2    e  Na
Michael Gecht
  • 1,374
  • 1
  • 17
  • 26