1

Is there a way to use the Python Dask package to mimic a Numpy masked array and do calculations considering the mask, like in Numpy:

import numpy as np

data = np.array([0, 1, 9999, 2, 1, 0, 9999])
value = 9999
mdata = np.ma.masked_where(data == value, data)  
result = (mdata * 2 + 10)

In the package documentation I only found dask.arrays which are equivalent to Numpy ndarrays and don't feature a masks. In addition slicing with another array seems also not possible. Therefore I can't find a way to do calculation only for parts of an array.

mastho
  • 11
  • 2

2 Answers2

1

You're correct, as of December 2016 Dask.array does not support masked arrays.

Additionally Dask.array needs to know the shape of each block at every step, so slicing by another dask array (which would require us to know the values at that point) is not supported.

MRocklin
  • 55,641
  • 23
  • 163
  • 235
  • Thank you for the answer. Nevertheless a mask support would be very useful for an application of dask in remote sensing satellite image processing. Within the [pytroll](http://www.pytroll.org/) community we are looking forward to have this feature available in dask. I will try to add a feature request on Github issue tracker then. – mastho Dec 06 '16 at 15:14
  • 1
    We have some of the masked out-of-core algorithms in biggus, and I have long wanted to get some time to implement them in dask. – pelson Jan 11 '17 at 23:35
1

Since May 2017 Dask has support for masked arrays (git link) with basic operations on arrays.

Code snippet bellow yields same result as the numpy's.

import dask
import dask.array as da    
    
data = da.array([0, 1, 9999, 2, 1, 0, 9999])
value = 9999
mdata = da.ma.masked_where(data == value, data)  
result = (mdata * 2 + 10).compute()