4

I have a list of Data Frames in which some Data Frames have NaN values. So far I can identify NaN values for a single Data Frame using this link.

How can I find the index of the list where a Data Frame has NaN values.

Sample list of dffs,

[                   
 var1       var1  
14.171250  13.593813
13.578317  13.595329
10.301850  13.580139
9.930217   NaN
6.192517   13.561943
NaN        13.565149
6.197983   13.572509,  

  var1       var2    
2.456183  5.907528
5.052017  5.955731
5.960000  5.972480
8.039317  5.984608
7.559217  5.985348
6.933633  5.979438,

 var1       var1  
14.171250  23.593813
23.578317  23.595329
56.301850  23.580139
90.930217   22.365676
89.192517   33.561943
86.23654   53.565149
NaN        13.572509,  
...]

I need to get the results in a list indexes 0 and 2 that have NaN values.

So far I tried this,

df_with_nan = []
for df in dffs:
    df_with_nan.append(df.columns[df.isnull().any()])

Per above for loop I get column names, var1 and var2. However, I need the indexes of those Data Frames when i loop through it. Any help or suggestion would be great.

i.n.n.m
  • 2,936
  • 7
  • 27
  • 51
  • When you say indices... you mean the indices of the list? – cs95 Jul 25 '17 at 18:05
  • @cᴏʟᴅsᴘᴇᴇᴅ yes, index of the list. So, I can identify data frames in that index. – i.n.n.m Jul 25 '17 at 18:06
  • You're almost there... try `if df.isnull().any().max():` Use enumerate, append the df index in the if block. – cs95 Jul 25 '17 at 18:07
  • Possible duplicate of [Find Indexes of Non-NaN Values in Pandas DataFrame](https://stackoverflow.com/questions/41150238/find-indexes-of-non-nan-values-in-pandas-dataframe) – Shihe Zhang Oct 27 '17 at 03:05

2 Answers2

3

You can use a conditional list comprehension to enumerate over all dataframes in your list and return the enumerated index value of those that contain any null values.

df_with_nan = [n for n, df in enumerate(dffs) if sum(df.isnull().any())]
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • thank you, list comprehension way works, if you could explain why you used `sum` here? – i.n.n.m Jul 25 '17 at 18:10
  • 2
    Using `sum` gives you the total number of columns that contain a `nan` value. Anything greater than zero evaluates to `True`. You could also use `df.isnull().any().any()`. – Alexander Jul 25 '17 at 18:16
1

You're almost there... just use enumerate to loop with indices, and df.isnull().values.any() (faster than df.isnull().any().max()) to test:

df_with_nan = []
for i, df in enumerate(dffs):
    if df.isnull().values.any():
        df_with_nan.append(i)

Granted, a list comp is shorter, but go for whatever you prefer.

cs95
  • 379,657
  • 97
  • 704
  • 746