Dropping NaN values from columns in stacked arrays

Question

I've got an array with three rows and about 25000 columns. I'm trying to drop those columns that have NaN values in any of the three columns, but struggling at doing so. So far I've managed to do the following which manages to drop rows with NaN values.

x = np.array([[1,2,3,1,2,3], 
              [4,5,np.nan,3,5,np.nan], 
              [7,8,9,4,5,6],])

x = x[~np.isnan(x).any(axis=1)]

If I use axis=0 though, this doesn't work. I'm trying not to convert it into a dataframe, since its being used as an array in the workflow, but I guess a workaround would be to transform it into a dataFrame, drop, and reorganize it as an array. Maybe someone has an idea how to do it as an array though :)

Shubham Sharma · Answer 1 · 2020-05-13T19:36:51.840

2

You can use np.isnan(x).any(axis=0) to find the columns which contains at least one np.nan values then you can use this mask to filter the columns of the given array. Use:

x = np.array([[1,2,3,1,2,3], 
              [4,5,np.nan,3,5,np.nan], 
              [7,8,9,4,5,6],])

x = x[:, ~np.isnan(x).any(axis=0)]
print(x)

This prints:

[[1. 2. 1. 2.]
 [4. 5. 3. 5.]
 [7. 8. 4. 5.]]

edited May 13 '20 at 19:36

answered May 13 '20 at 19:31

Shubham Sharma

68,127
6
24
53

Dropping NaN values from columns in stacked arrays

1 Answers1