define a numpy signature for output array of unknown length (for use in numba.guvectorize)

Question

Is it possible to create a signature for a numpy ufunc that returns an 1d array of unknown length?

I have a function that takes in one array x of length (n) and an array of labels y of length (m), performs a reduction and returns the array out of unknown size.

The function itself will be wrapped with numba.guvectorize:

@guvectorize([(int16[:], float64[:], int32[:], int16[:])], "(n),(m) -> (l)", nopython=True)
def fun(x, y, out):
    #perform stuff
    # ...
    # no return in guvectorize

This returns the following error:

NameError: undefined output symbols: l

My solution would be to pass in a template array of length l, but it wouldn't be used for any calculation, so I'd like to avoid it.

Any ways around this, or is passing in a template the best (and maybe not so bad) solution?

Edit:

Some valid points made in the comments I want to address:

The function is supposed to be applied on an array with dimensions (x, y, z) along the z dimension, which has length n.

The intended purpose of the function is to take each 1d array along z

[z,z,z,z,z,z,z,z,...,z]

and expand it to length m

[z1,z1,z1,z2,z2,z2,z2,z3,...,zz]

and finally the corresponding values are averaged

[z1,z2,z3,z4,z5,z6,z7,z8,...,zz]

resulting in an array with length l.

I know beforehand exactly what the sizes m, n and l will be - I just need to "tell" it to the function. This is why I also don't expect any jagged outputs.

The fastest way to apply this to a big 3d array using xarray is with guvectorize. But this results in the issue above.

A working solution is to pass in a template of length l.

For comparison, I've created a @njit wrapped function, that manually loops over the first two dimension, applying the same functionality.

Unfortunately this is still about 4 times slower than using guvectorize., so I'd like to use guvectorize for this application.

That doesn't make sense as a ufunc - there's no way to broadcast this, since the shapes don't work out right. — user2357112, Dec 17 '20 at 10:07
This sounds like a job for `numba.jit`, not `numba.guvectorize`. — user2357112, Dec 17 '20 at 10:08
@user2357112supportsMonica I'm explicitly using `guvectorize` because the function needs to be applied to a dimension of a 3d array (with the help of `xarray` and `dask`) - so as far as I can see it, no way to use `jit` for that — Val, Dec 17 '20 at 10:09
Well, I exactly know the input dimensions and the desired output dimensions, as well as the reductions happening. So I don't expect jagged output (edit: which is why I didn't give it too much thought) — Val, Dec 17 '20 at 10:18

define a numpy signature for output array of unknown length (for use in numba.guvectorize)

Edit:

0 Answers0