Find row index of pandas dataframe that don't have finite values

Question

I have a large dataframe that I want to split when all columns are nan or don't have a finite value. I am looking for something similar to the post Drop rows of pandas dataframe that don't have finite values in certain variable(s) but rather than dropping I'd like to split on those rows.

I am currently on pandas 0.16.0

does `df[df.apply(lambda x: x.isnull().all(), axis=1)]` work? — EdChum, Feb 23 '16 at 15:39
@EdChum absolutely perfect. Thank you. the dropna returns the dataframe without the nans, not the rows with the nans. — dlwlrma, Feb 23 '16 at 15:42
Which worked, the first suggestion?, it will be slow for a large df, not sure if it's quicker to do `df.loc[df.index.difference(df.dropna(how='all').index)]` — EdChum, Feb 23 '16 at 15:45

score 1 · Answer 1 · answered Feb 23 '16 at 15:44

1

As @EdChum has pointed out

df[df.apply(lambda x: x.isnull().all(), axis=1)]

does the trick.

answered Feb 23 '16 at 15:44

dlwlrma

99
2
13

EdChum · Accepted Answer · 2016-02-23T15:55:52.027

It'll be quicker to filter the non-NaN rows from your df by calling index.difference on the index labels returned from dropna:

In [69]:
df = pd.DataFrame({'a':[0,np.NaN, 0], 'b':[np.NaN, np.NaN, 1]})
df = pd.concat([df]*10000, ignore_index=True)   

%timeit df[df.apply(lambda x: x.isnull().all(), axis=1)]
%timeit df.loc[df.index.difference(df.dropna(how='all').index)]

1 loops, best of 3: 2.82 s per loop
100 loops, best of 3: 8.95 ms per loop

You can see that for a 30k row df, the latter method is much faster

Find row index of pandas dataframe that don't have finite values

2 Answers2