Find Indexes of a List of DataFrame that have NaN Values - Pandas

Question

I have a list of Data Frames in which some Data Frames have NaN values. So far I can identify NaN values for a single Data Frame using this link.

How can I find the index of the list where a Data Frame has NaN values.

Sample list of dffs,

[                   
 var1       var1  
14.171250  13.593813
13.578317  13.595329
10.301850  13.580139
9.930217   NaN
6.192517   13.561943
NaN        13.565149
6.197983   13.572509,  

  var1       var2    
2.456183  5.907528
5.052017  5.955731
5.960000  5.972480
8.039317  5.984608
7.559217  5.985348
6.933633  5.979438,

 var1       var1  
14.171250  23.593813
23.578317  23.595329
56.301850  23.580139
90.930217   22.365676
89.192517   33.561943
86.23654   53.565149
NaN        13.572509,  
...]

I need to get the results in a list indexes 0 and 2 that have NaN values.

So far I tried this,

df_with_nan = []
for df in dffs:
    df_with_nan.append(df.columns[df.isnull().any()])

Per above for loop I get column names, var1 and var2. However, I need the indexes of those Data Frames when i loop through it. Any help or suggestion would be great.

@cᴏʟᴅsᴘᴇᴇᴅ yes, index of the list. So, I can identify data frames in that index. — i.n.n.m, Jul 25 '17 at 18:06
You're almost there... try `if df.isnull().any().max():` Use enumerate, append the df index in the if block. — cs95, Jul 25 '17 at 18:07
Possible duplicate of [Find Indexes of Non-NaN Values in Pandas DataFrame](https://stackoverflow.com/questions/41150238/find-indexes-of-non-nan-values-in-pandas-dataframe) — Shihe Zhang, Oct 27 '17 at 03:05

score 3 · Answer 1 · answered Jul 25 '17 at 18:06

3

You can use a conditional list comprehension to enumerate over all dataframes in your list and return the enumerated index value of those that contain any null values.

df_with_nan = [n for n, df in enumerate(dffs) if sum(df.isnull().any())]

answered Jul 25 '17 at 18:06

Alexander

105,104
32
201
196

thank you, list comprehension way works, if you could explain why you used `sum` here? – i.n.n.m Jul 25 '17 at 18:10
2

Using `sum` gives you the total number of columns that contain a `nan` value. Anything greater than zero evaluates to `True`. You could also use `df.isnull().any().any()`. – Alexander Jul 25 '17 at 18:16

cs95 · Accepted Answer · 2017-07-25T18:30:48.207

1

You're almost there... just use enumerate to loop with indices, and df.isnull().values.any() (faster than df.isnull().any().max()) to test:

df_with_nan = []
for i, df in enumerate(dffs):
    if df.isnull().values.any():
        df_with_nan.append(i)

Granted, a list comp is shorter, but go for whatever you prefer.

edited Jul 25 '17 at 18:30

answered Jul 25 '17 at 18:09

cs95

379,657
97
704
746

@COLDSPEED you have `max` here and @Alexander answer, `sum` is used, why did you use `max`? is it `max` index? – i.n.n.m Jul 25 '17 at 18:14
1

@i.n.n.m Sum is a fancy way of using or over booleans. Max of T and F is the same as max of 1 and 0. I'd wager this is faster than sum though. You should do a benchmark before accepting an answer ;) – cs95 Jul 25 '17 at 18:15
@COLDSPEED Thank you. yes, i am trying to find specifics on why they were different :) – i.n.n.m Jul 25 '17 at 18:19
1

`any().any()` would be faster still as it stops executing after the the first null value. – Alexander Jul 25 '17 at 18:22
@Alexander Yes, any().any() came to mind first before any().max(), but it looks ugly :( Anyway, I'll let you have it, :) – cs95 Jul 25 '17 at 18:23
1

@Alexander What do you think of `df.isnull().values.any()`? – cs95 Jul 25 '17 at 18:24
Yes, that appears to be much faster. – Alexander Jul 25 '17 at 18:28
@i.n.n.m I have a faster version for you. Look at my edit. – cs95 Jul 25 '17 at 18:31
@COLDSPEED, yes, this is much faster, thank you for the edit! – i.n.n.m Jul 25 '17 at 18:32

Find Indexes of a List of DataFrame that have NaN Values - Pandas

2 Answers2