2

I've got an array with three rows and about 25000 columns. I'm trying to drop those columns that have NaN values in any of the three columns, but struggling at doing so. So far I've managed to do the following which manages to drop rows with NaN values.

x = np.array([[1,2,3,1,2,3], 
              [4,5,np.nan,3,5,np.nan], 
              [7,8,9,4,5,6],])

x = x[~np.isnan(x).any(axis=1)]

If I use axis=0 though, this doesn't work. I'm trying not to convert it into a dataframe, since its being used as an array in the workflow, but I guess a workaround would be to transform it into a dataFrame, drop, and reorganize it as an array. Maybe someone has an idea how to do it as an array though :)

Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53

1 Answers1

2

You can use np.isnan(x).any(axis=0) to find the columns which contains at least one np.nan values then you can use this mask to filter the columns of the given array. Use:

x = np.array([[1,2,3,1,2,3], 
              [4,5,np.nan,3,5,np.nan], 
              [7,8,9,4,5,6],])

x = x[:, ~np.isnan(x).any(axis=0)]
print(x)

This prints:

[[1. 2. 1. 2.]
 [4. 5. 3. 5.]
 [7. 8. 4. 5.]]
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53