common workaround
The most efficient way I am aware of is to handle the mask manually. Here a short benchmark for calculating a masked mean
along an axis. As of 2021 (np.version 1.19.2) the manual implementation is 3x faster.
It is worth noting, that
np.nanmean
is as slow as ma.mean
. However, I did not find an easy workaround for that, as 0 * nan -> nan
and np.where
is time consuming.
opencv
usually has a mask argument for its routines. But switching library might not be suitable in most cases.
benchmark
benchmark manual (np.sum(..values..)/np.sum(..counts..))
time for 100x np_mean: 0.15721
benchmark ma.mean
time for 100x ma_mean: 0.580072
benchmark np.nanmean
time for 100x nan_mean: 0.609166
np_mean[:5]: [0.74468436 0.75447124 0.75628326 0.74990387 0.74708414]
ma_mean[:5]: [0.7446843592460088 0.7544712410870448 0.7562832614361736
0.7499038657880674 0.747084143818861]
nan_mean[:5]: [0.74468436 0.75447124 0.75628326 0.74990387 0.74708414]
np_mean == ma_mean: True
np_mean == nan_mean: True
np.__version__: 1.19.2
code
import timeit
import numpy as np
import numpy.ma as ma
np.random.seed(0)
arr = np.random.rand(1000, 1000)
msk = arr > .5 # POSITIV mask: only emelemts > .5 are processed
print('\nbenchmark manual (np.sum(..values..)/np.sum(..counts..))')
np_mean = np.sum(arr * msk, axis=0)/np.sum(msk, axis=0)
t = timeit.Timer('np_mean = np.sum(arr * msk, axis=0)/np.sum(msk, axis=0)', globals=globals())
print('\ttime for 100x np_mean:', round(t.timeit(100), 6))
print('\nbenchmark ma.mean')
ma_arr = ma.masked_array(arr, mask=~msk)
ma_mean = ma.mean(ma_arr, axis=0)
t = timeit.Timer('ma_mean = ma.mean(ma_arr, axis=0)', globals=globals())
print('\ttime for 100x ma_mean:', round(t.timeit(100), 6))
print('\nbenchmark np.nanmean')
nan_arr = arr.copy()
nan_arr[~msk] = np.nan
nan_mean = np.nanmean(nan_arr, axis=0)
t = timeit.Timer('nan_mean = np.nanmean(nan_arr, axis=0)', globals=globals())
print('\ttime for 100x nan_mean:', round(t.timeit(100), 6))
print('\n')
print('np_mean[:5]:', np_mean[:5])
print('ma_mean[:5]:', ma_mean[:5])
print('nan_mean[:5]:', nan_mean[:5])
print('np_mean == ma_mean: ', (np_mean == ma_mean).all())
print('np_mean == nan_mean: ', (np_mean == nan_mean).all())
print('np.__version__:', np.__version__)
The manaul Version only works if there are no nans
in the array.
If arr
contains nans
:
Just construct the mask by msk = np.isnan(arr)
and afterwards replace the nans in arr
by arr = np.nan_to_num(arr, copy=False, nan=0)
.