Using Chunked array (Akwkard lib) for fancy indexing or masking

Question

I am loading a root file with uproot.lazyarrays() which produces a Table. I compute a function of this table which returns a JaggedArray whose length is equal to the length of the table. This is in the form of a ChunkedArray, and I would like to use it as a mask, applied on another JaggedArray derived from the table, again in the form of a ChunkedArray. However, it seems this does not work.

While running this: partial_prediction = array[variable][additional_selection_mask] I get the following error:

~/.local/lib/python3.7/site-packages/awkward/array/chunked.py in __getitem__(self, where)
    334                 if isinstance(h, awkward.array.virtual.VirtualArray):
    335                     h = h.array
--> 336                 chunks.append(c[h, tail])
    337                 chunksizes.append(len(chunks[-1]))
    338             return self.copy(chunks=chunks, chunksizes=chunksizes)

~/.local/lib/python3.7/site-packages/awkward/array/jagged.py in __getitem__(self, where)
    768 
    769                 else:
--> 770                     raise TypeError("cannot interpret shape {0}, dtype {1} as a fancy index or mask".format(head.shape, head.dtype))
    771 
    772             if isinstance(node, self.numpy.ndarray) and len(node.shape) < sum(0 if isinstance(x, slice) else 1 for x in tail):

TypeError: cannot interpret shape (0,), dtype float64 as a fancy index or mask

To illustrate what the arrays look like, here is a print code:

print("array[variable]")
print("type = ", type(array[variable]))
print("len = ", len(array[variable]))
print("array = ", array[variable])
print()
print("additional_selection_mask")
print("type = ", type(additional_selection_mask))
print("len = ", len(additional_selection_mask))
print("array = ", additional_selection_mask)

which outputs the following

array[variable]
type =  <class 'awkward.array.chunked.ChunkedArray'>
len =  126071
array =  [[3.9413936 2.9023154 2.9157693 ... 1.8322366 1.8115876 1.7142034] [2.514293 49.567352 13.9077 ... 2.4213006 1.6156256 2.8986027] [1.220779 1.2491984 1.4126266 ... 2.3114712 1.7704046 1.347573] ... [0.32590318 1.2202137 1.7752564 ... 2.4342306 2.7896073 1.1572217] [0.2279669 0.21500091 0.21401915 ... 0.18808685 0.16509545 0.15571955] [2.000058 1.925363 1.8934264 ... 2.5060847 2.1708803 2.227355]]

additional_selection_mask
type =  <class 'awkward.array.chunked.ChunkedArray'>
len =  126071
array =  [False False False ... False False False]

In this case, additional_selection_mask is not a JaggedArray but in the general case it can be such, of the same shape of array[variable].

I believe the problem lies in the fact the chunksize is different for the two arrays: Probably it is enough to either 1) convert the ChunkedArray into a JaggedArray, but this is not clear how to do that OR 2) make sure the chunk size is the same for both, which again, I don't know how to do it. — Nicolò Foppiani, Nov 19 '19 at 15:06
That line of code (chunked.py:336) has a bug in it that was raised by [scikit-hep/uproot#412](https://github.com/scikit-hep/uproot/issues/412) and fixed by [scikit-hep/awkward-array#216](https://github.com/scikit-hep/awkward-array/pull/216) this morning. Try upgrading. — Jim Pivarski, Nov 22 '19 at 20:01

Using Chunked array (Akwkard lib) for fancy indexing or masking

0 Answers0