Remove NaN row from X array and also the corresponding row in Y

Question

I have an X array with NaN and I can remove the row with NaN as such:

import numpy as np
x = x[~np.isnan(x)]

But I have a corresponding Y array

assert len(x) == len(y) # True
x = x[~np.isnan(x)]
assert len(x) == len(y) # False and breaks

How do I remove the corresponding rows from the Y array?

My X array looks like this:

>>> x
[[ 2.67510434  2.67521927  3.49296989  3.80100625  4.          2.83631844]
 [ 3.47538057  3.4752436   3.62245715  4.0720535   5.          3.7773169 ]
 [ 2.6157049   2.61583852  3.48335887  3.78088813  0.          2.78791096]
 ..., 
 [ 3.60408952  3.60391203  3.64328267  4.1156462   5.          3.77933333]
 [ 2.66773792  2.66785516  3.49177798  3.7985113   4.          2.83631844]
 [ 3.26622238  3.26615124  3.58861468  4.00121327  5.          3.49693169]]

But something weird is going on:

indexes = ~np.isnan(x)
print indexes

[out]:

[[ True  True  True  True  True  True]
 [ True  True  True  True  True  True]
 [ True  True  True  True  True  True]
 ..., 
 [ True  True  True  True  True  True]
 [ True  True  True  True  True  True]
 [ True  True  True  True  True  True]]

Do you mean `y = y[~np.isnan(x)]` above? Don't forget to call `x = x[~np.isnan(x)]` _after_ this statement. — xnx, Dec 17 '14 at 18:43
Try `np.mat(x)[~np.isnan(x)]`. `np.array(x)[~np.isnan(x)]` is going to return a 1d array while np.mat will keep its dimensions. — Kyle G, Dec 17 '14 at 19:15

Jaime · Accepted Answer · 2014-12-17T23:49:59.173

You are getting rid of items which are NaN, not of rows with NaN. The proper thing to do would be:

mask = ~np.any(np.isnan(x), axis=1)
x = x[mask]
y = y[mask]

To see the different behavior of both approaches:

>>> x = np.random.rand(4, 5)
>>> x[[0, 2], [1, 4]] = np.nan
>>> x
array([[ 0.37499461,         nan,  0.51254549,  0.5253203 ,  0.3955948 ],
       [ 0.73817831,  0.70381481,  0.45222295,  0.68540433,  0.76113544],
       [ 0.1651173 ,  0.41594257,  0.66327842,  0.86836192,         nan],
       [ 0.70538764,  0.31702821,  0.04876226,  0.53867849,  0.58784935]])
>>> x[~np.isnan(x)]  # 1D array with NaNs removed
array([ 0.37499461,  0.51254549,  0.5253203 ,  0.3955948 ,  0.73817831,
        0.70381481,  0.45222295,  0.68540433,  0.76113544,  0.1651173 ,
        0.41594257,  0.66327842,  0.86836192,  0.70538764,  0.31702821,
        0.04876226,  0.53867849,  0.58784935])
>>> x[~np.any(np.isnan(x), axis=1)]  # 2D array with rows with NaN removed
array([[ 0.73817831,  0.70381481,  0.45222295,  0.68540433,  0.76113544],
       [ 0.70538764,  0.31702821,  0.04876226,  0.53867849,  0.58784935]]

for me, `~np.any(np.isnan(x, axis=1))` returns an error: `TypeError: 'axis' is an invalid keyword to ufunc 'isnan'` — alvas, Dec 17 '14 at 22:15
I messed up with the location of the parenthesis, it should be `~np.any(np.isnan(x), axis=1)`. — Jaime, Dec 17 '14 at 23:49

score 2 · Answer 2 · answered Dec 17 '14 at 18:44

2

indexes = ~np.isnan(x)
x = x[indexes]
y = y[indexes]

answered Dec 17 '14 at 18:44

Kyle G

1,017
11
18

I'm getting `IndexError: too many indices for array` for your answer and also @xnx method. – alvas Dec 17 '14 at 18:58
Are you sure that `x` and `y` are the same length? – Kyle G Dec 17 '14 at 18:58
The Plural of index is "indices", not "indexes"! Otherwise acknowledged! – jkalden Dec 17 '14 at 20:10
1

Oxford Dictionary, see e.g. https://english.stackexchange.com/questions/61080/plural-of-index-indexes-or-indices/3126 – Bart Nov 29 '17 at 13:04
1

@Bart I appreciate the citation and hence accept indexes; however, the citation leaves the issue undecided, and as I am a scientist i stick to "indices" ;) – jkalden Nov 30 '17 at 12:59
What is the reason for "~" here? – Chris8447 Apr 09 '20 at 05:46
1

@Chris8447 `~` is the `invert` operator, i.e. `~np.array([True, False]) == np.array([False, True])`. see https://docs.scipy.org/doc/numpy/reference/generated/numpy.invert.html – Kyle G Apr 09 '20 at 13:18

Remove NaN row from X array and also the corresponding row in Y

2 Answers2

Linked

Related