0

I have the following data formats:

  1. type1 = np.array([...], dtype='float64') # vector of floats
  2. type2 = np.array([type1, type1, ...], dtype='object') # array of type 1
  3. type3 = np.array([type2, type2, ...], dtype='object') # array of type 2

(Just to focus the attention: type1 would be timeseries signal, and higher order types would constructed by splitting signal into smaller and smaller chunks of different sizes)

Now, I'd like to apply a function (e.g. filtering) to each 1D vector (type) witch the following requirements:

  1. iterator accepts type1, type2 or type3 data and applies function to each type1 element.
  2. it's possible to iterate over 2 arrays of the same dimensionality
  3. it's possible to extract additional info from iterator.

So the signature could be like:

out = apply([main_array, aux_array], function_to_apply, **kwargs)

with main_array modified inside apply

Possible way to go, could be np.nditer like here: Vectorwise iteration of nD array with nditer however I haven't been able to create working solution yet. Any ideas?

EDIT: Thanks @JeromeRichards and @hpaulj for pointing right direction. I ended up with such recursive solution:

def zip_none(data, aux):
    '''zip routine with none broadcasting'''
    return zip(data, *aux) if aux is not None else zip(data)

def apply_1d(data, func, aux=None, **kwargs):
    '''Apply `func` to each 1d vector in data. Params:
    data - list of input data or np 1D array
    aux - list of additional input data. Each element of aux should share dimensionality with data
    func - function to be applied
    kwargs - params to be passed to `func`'''
    if isinstance(data, list):
        result = []
        multi_output = False
        for x, *y in zip_none(data, aux):
            aux = y if y else None
            tmp = apply_1d(x, func, aux=aux, **kwargs)
            result.append(tmp)

            if isinstance(tmp, tuple):
                multi_output = True

        if multi_output:
            result = tuple(list(out) for out in zip(*result))

    else:
        return func(data, *aux, **kwargs) if aux is not None else func(data, **kwargs)

    return result
  • Generic CPython functions cannot be vectorized automatically. Even `np.vectorize` of Numpy is not able to do that. The only way is to do that manually. Besides, note that `object`-typed array are slow since Numpy cannot truely vectorize operations on such array because of CPython objects. Pure-CPython loops are inherently slow and using `nd.iter` does not really help. It is just for convenience. – Jérôme Richard Dec 27 '22 at 14:53
  • @JérômeRichard thanks. In fact, convenience is what I am looking for (more than performance optimization). I'd like to hide and commonize looping code (1D / 2D / 3D cases) effectively applying the same data processing – Piotr Herbut Dec 27 '22 at 15:00
  • I'm not sure why you are referencing my `nditer` answer. Even in the simple copy case, `nditer` shows only modest performance gains; and I tried to stress how complicated it is. I believe you can set a flag to allow object dtypes, but I don't recall doing much with it. And I don't see how that would help generalize your nesting. Work out the details of your nested nests with explicit loops or loop comprehensions, For most the most part, object dtype lists are no better than lists, and some ways worse. – hpaulj Dec 27 '22 at 16:52

0 Answers0