Masking in Dask

Question

I was just wondering if someone could help show me how to apply functions such as "sum" or "mean" on masks arrays using dask. I wish to calculate the sum / mean of the array on only values where there is no mask.

Code:

import dask.array as da
import numpy as np
import numpy.ma as ma
dset = [1, 2, 3, 4] 
masked = ma.masked_equal(dset, (4)) # lets say 4 should be masked
print(np.sum(masked)) # output: 6
print(np.mean(masked)) # output: 2
print(masked) # output: [1, 2, 3, -]
masked_array = da.from_array(masked, chunks=(4))
print(masked_array.sum().compute(): # output: 10
print(masked_array.mean().compute()) # output: 2.5

Is there a way I can have my masked sum equal to np.sum(masked) and masked mean equal to 2 by ignoring the "4" value? It seems that numpy is able to ignore the "4" in its calculations but dask is not in this case.

If you believe that something is a bug then I suggest raising an issue — MRocklin, Jul 05 '18 at 18:21
Solved! You just need to use the dask "mask" rather than numpy "mask" to calculate a mean or sum with the mask. — Chen, Jul 06 '18 at 16:12
I recommend providing this as an answer to your question. That way other people will find it more quickly — MRocklin, Jul 06 '18 at 18:16

score 0 · Answer 1 · answered Jun 04 '21 at 17:17

Dask supports several operations on masked array, full list is available in the Dask' docs.

Example of computing mean and sum of masked array:

import dask
import dask.array as da

dset = da.array([1, 2, 3, 4])
mdata = da.ma.masked_equal(dset, 4)  
print(da.sum(mdata).compute()) # output: 6
print(da.ma.average(mdata).compute()) # output: 2

Masking in Dask

1 Answers1

Linked