4

I have a flat array b:

a = numpy.array([0, 1, 1, 2, 3, 1, 2])

And an array c of indices marking the start of each "chunk":

b = numpy.array([0, 4])

I know I can find the maximum in each "chunk" using a reduction:

m = numpy.maximum.reduceat(a,b)
>>> array([2, 3], dtype=int32)

But... Is there a way to find the index of the maximum <edit>within a chunk</edit> (like numpy.argmax), with vectorized operations (no lists, loops)?

Divakar
  • 218,885
  • 19
  • 262
  • 358
Benjamin
  • 11,560
  • 13
  • 70
  • 119
  • Deleted my question temporarily because I thought I had an answer: `numpy.argmax(numpy.equal.outer(m,a), axis=1)`, but that doesn't work for examples where the same max occurs in many places... – Benjamin Jan 24 '17 at 17:07
  • For instance on this array: `a = numpy.array([0, 1, 1, 3, 3, 1, 2])`, where the same maximum `3` occurs in the two chunks. – Benjamin Jan 24 '17 at 17:13
  • The problem is that `np.maximum` is a `ufunc` with `reduceat` - which effectively iterates through the array, comparing 2 values at a time. But `np.max` and `np.argmax` are functions that operate on the whole array at once. They aren't `ufunc`. – hpaulj Jan 24 '17 at 17:27
  • @hpaulj, yes, I'm aware of that. I'm asking if anyone can think of a workaround with the same behaviour. – Benjamin Jan 24 '17 at 17:29

1 Answers1

2

Borrowing the idea from this post.

Steps involved :

  • Offset all elements in a group by a limit-offset. Sort them globally, thus limiting each group to stay at their positions, but sorting the elements within each group.

  • In the sorted array, we would look for the last element, which would be the group max. Their indices would be the argmax after offsetting down for the group lengths.

Thus, a vectorized implementation would be -

def numpy_argmax_reduceat(a, b):
    n = a.max()+1  # limit-offset
    grp_count = np.append(b[1:] - b[:-1], a.size - b[-1])
    shift = n*np.repeat(np.arange(grp_count.size), grp_count)
    sortidx = (a+shift).argsort()
    grp_shifted_argmax = np.append(b[1:],a.size)-1
    return sortidx[grp_shifted_argmax] - b

As a minor tweak and possibly faster one, we could alternatively create shift with cumsum and thus have a variation of the earlier approach, like so -

def numpy_argmax_reduceat_v2(a, b):
    n = a.max()+1  # limit-offset
    id_arr = np.zeros(a.size,dtype=int)
    id_arr[b[1:]] = 1
    shift = n*id_arr.cumsum()
    sortidx = (a+shift).argsort()
    grp_shifted_argmax = np.append(b[1:],a.size)-1
    return sortidx[grp_shifted_argmax] - b
Community
  • 1
  • 1
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Both solutions work out nicely in my case, because I already have `shift` from an earlier operation. Nice answer. – Benjamin Jan 24 '17 at 18:31
  • Hey you have answered my questions a few months ago could you take a look at this post: https://stackoverflow.com/questions/67680199/getting-the-min-and-the-index-of-chunks-numpy-python/67694852#67694852. it is related to this question. – tony selcuk May 25 '21 at 20:23