Recombine arrays obtained from subsetting on some of the dimensions of original array

Question

I have a 3-dim array, which I subset based on 2 of the 3 dimensions

import dask.array as da
import numpy as np

np.random.seed(40)
test_arr = np.random.normal(size=(2,3,4))

array([[[-0.6075477 , -0.12613641, -0.68460636,  0.92871475],
        [-1.84440103, -0.46700242,  2.29249034,  0.48881005],
        [ 0.71026699,  1.05553444,  0.0540731 ,  0.25795342]],

       [[ 0.58828165,  0.88524424, -1.01700702, -0.13369303],
        [-0.4381855 ,  0.49344349, -0.19900912, -1.27498361],
        [ 0.29349415,  0.10895031,  0.03172679,  1.27263986]]])

bool_check = test_arr[:,:,0] < 0.6

array([[ True,  True, False],
       [ True,  True,  True]])

# shape is (5,4)
arr1 = test_arr[bool_check]
# shape is (1,4)
arr2 = test_arr[~bool_check]

Note that I would rather have made test_arr a dask array from the start, but dask doesn't allow me to subset in this way like numpy does.

Now imagine in my actual use-case I do a bunch of manipulations that are irrelevant here and then want to reconstitute arr1 and arr1 into arr3 by subsetting. How would I do it?

arr3 = da.zeros_like(test_arr)

# this gives an error
arr3[da.from_array(bool_check)] = arr1

ValueError: Boolean index assignment in Dask expects equally shaped arrays.

So I did get this to work using all numpy arrays, but with Dask I get an error: ValueError: Boolean index assignment in Dask expects equally shaped arrays. "Alternatively, you can use the extended API that supports indexing with tuples" — matsuo_basho, Jul 19 '23 at 20:38
Looks like this is a known dask limitation: https://stackoverflow.com/questions/72273565/how-to-apply-a-2d-boolean-array-on-a-3d-dask-array-in-python https://github.com/dask/dask/issues/7550 — matsuo_basho, Jul 21 '23 at 17:31

Recombine arrays obtained from subsetting on some of the dimensions of original array

0 Answers0