3

I was just wondering if someone could help show me how to apply functions such as "sum" or "mean" on masks arrays using dask. I wish to calculate the sum / mean of the array on only values where there is no mask.

Code:

import dask.array as da
import numpy as np
import numpy.ma as ma
dset = [1, 2, 3, 4] 
masked = ma.masked_equal(dset, (4)) # lets say 4 should be masked
print(np.sum(masked)) # output: 6
print(np.mean(masked)) # output: 2
print(masked) # output: [1, 2, 3, -]
masked_array = da.from_array(masked, chunks=(4))
print(masked_array.sum().compute(): # output: 10
print(masked_array.mean().compute()) # output: 2.5

Is there a way I can have my masked sum equal to np.sum(masked) and masked mean equal to 2 by ignoring the "4" value? It seems that numpy is able to ignore the "4" in its calculations but dask is not in this case.

Chen
  • 29
  • 5

1 Answers1

0

Dask supports several operations on masked array, full list is available in the Dask' docs.

Example of computing mean and sum of masked array:

import dask
import dask.array as da

dset = da.array([1, 2, 3, 4])
mdata = da.ma.masked_equal(dset, 4)  
print(da.sum(mdata).compute()) # output: 6
print(da.ma.average(mdata).compute()) # output: 2