0

I'm trying to use a boolean mask to address rows in a numpy array:

isnan = np.isnan(self.X[:, AGE_COLUMN].astype(float))
self.X[isnan, AGE_COLUMN] = np.mean(self.X[:, AGE_COLUMN].astype(float))

isnan and X are dtype.

First I check which rows in the age column are nan. And then I want to set these values to the mean of all ages. The debugger has following result for self.X[isnan, AGE_COLUMN]:

[nan nan nan nan nan nan nan nan nan nan ....]

If I try self.X[[True, False, True], AGE_COLUMN] for example it returns the indexed rows. But with the isnan array it does not work.

How can I fix this to set the nans to the mean.

L3n95
  • 1,505
  • 3
  • 25
  • 49
  • 1
    Numpy has `nanmean` and `nan_to_num` functions; those should help. –  Apr 30 '17 at 10:21
  • 1
    the `mean` of `something` and `nan` is `nan`, you should get rid of `nan`s before computing the mean – MMF Apr 30 '17 at 10:23

1 Answers1

1

Do as follows using numpy.nanmean: it will ignore NaNs

self.X[isnan, AGE_COLUMN] = np.nanmean(self.X[:, AGE_COLUMN].astype(float))

From the documentation

numpy.nanmean(a, axis=None, dtype=None, out=None, keepdims=)

Compute the arithmetic mean along the specified axis, ignoring NaNs.

Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis. float64 intermediate and return values are used for integer inputs.

For all-NaN slices, NaN is returned and a RuntimeWarning is raised.

Cecilia
  • 4,512
  • 3
  • 32
  • 75
MMF
  • 5,750
  • 3
  • 16
  • 20