Say I have a 3D dask array representing a time series of temperature for the whole U.S., [Time, Lat, Lon]
. I want to get tabular time series for 100 different locations. With numpy fancy indexing this would look something like [:, [lat1, lat2...], [lon1, lon2...]]
. Dask arrays do not yet allow this kind of indexing. What is the best way to accomplish this task given that limitation?
Asked
Active
Viewed 955 times
4

Philip Blankenau
- 171
- 10
1 Answers
6
Using the vindex
indexer. This accepts pointwise indexing or full slices only:
In [1]: import dask.array as da
In [2]: import numpy as np
In [3]: x = np.arange(1000).reshape((10, 10, 10))
In [4]: dx = da.from_array(x, chunks=(5, 5, 5))
In [5]: xcoords = [1, 3, 5]
In [6]: ycoords = [2, 4, 6]
In [7]: x[:, xcoords, ycoords]
Out[7]:
array([[ 12, 34, 56],
[112, 134, 156],
[212, 234, 256],
[312, 334, 356],
[412, 434, 456],
[512, 534, 556],
[612, 634, 656],
[712, 734, 756],
[812, 834, 856],
[912, 934, 956]])
In [8]: dx.vindex[:, xcoords, ycoords].compute()
Out[8]:
array([[ 12, 112, 212, 312, 412, 512, 612, 712, 812, 912],
[ 34, 134, 234, 334, 434, 534, 634, 734, 834, 934],
[ 56, 156, 256, 356, 456, 556, 656, 756, 856, 956]])
A few caveats:
This not (yet) available in numpy arrays, but is proposed. See the proposal here.
This is not fully compatible with numpy fancy indexing, as it places new axes always at the front. A simple
transpose
can rearange these though:
Ex:
In [9]: dx.vindex[:, xcoords, ycoords].T.compute()
Out[9]:
array([[ 12, 34, 56],
[112, 134, 156],
[212, 234, 256],
[312, 334, 356],
[412, 434, 456],
[512, 534, 556],
[612, 634, 656],
[712, 734, 756],
[812, 834, 856],
[912, 934, 956]])

jiminy_crist
- 2,395
- 2
- 17
- 23