4

Say I have a 3D dask array representing a time series of temperature for the whole U.S., [Time, Lat, Lon]. I want to get tabular time series for 100 different locations. With numpy fancy indexing this would look something like [:, [lat1, lat2...], [lon1, lon2...]]. Dask arrays do not yet allow this kind of indexing. What is the best way to accomplish this task given that limitation?

1 Answers1

6

Using the vindex indexer. This accepts pointwise indexing or full slices only:

In [1]: import dask.array as da

In [2]: import numpy as np

In [3]: x = np.arange(1000).reshape((10, 10, 10))

In [4]: dx = da.from_array(x, chunks=(5, 5, 5))

In [5]: xcoords = [1, 3, 5]

In [6]: ycoords = [2, 4, 6]

In [7]: x[:, xcoords, ycoords]
Out[7]:
array([[ 12,  34,  56],
       [112, 134, 156],
       [212, 234, 256],
       [312, 334, 356],
       [412, 434, 456],
       [512, 534, 556],
       [612, 634, 656],
       [712, 734, 756],
       [812, 834, 856],
       [912, 934, 956]])

In [8]: dx.vindex[:, xcoords, ycoords].compute()
Out[8]:
array([[ 12, 112, 212, 312, 412, 512, 612, 712, 812, 912],
       [ 34, 134, 234, 334, 434, 534, 634, 734, 834, 934],
       [ 56, 156, 256, 356, 456, 556, 656, 756, 856, 956]])

A few caveats:

  • This not (yet) available in numpy arrays, but is proposed. See the proposal here.

  • This is not fully compatible with numpy fancy indexing, as it places new axes always at the front. A simple transpose can rearange these though:

Ex:

In [9]: dx.vindex[:, xcoords, ycoords].T.compute()
Out[9]:
array([[ 12,  34,  56],
       [112, 134, 156],
       [212, 234, 256],
       [312, 334, 356],
       [412, 434, 456],
       [512, 534, 556],
       [612, 634, 656],
       [712, 734, 756],
       [812, 834, 856],
       [912, 934, 956]])
jiminy_crist
  • 2,395
  • 2
  • 17
  • 23