2

Here is what I am trying to do with Numpy in Python 2.7. Suppose I have an array a defined by the following:

a = np.array([[1,3,3],[4,5,6],[7,8,1]])

I can do a.argmax(0) or a.argmax(1) to get the row/column wise argmax:

a.argmax(0)
Out[329]: array([2, 2, 1], dtype=int64)
a.argmax(1)
Out[330]: array([1, 2, 1], dtype=int64)

However, when there is a tie like in a's first row, I would like to get the argmax decided randomly between the ties (by default, Numpy returns the first element whenever a tie occurs in argmax or argmin).

Last year, someone put a question on solving Numpy argmax/argmin ties randomly: Select One Element in Each Row of a Numpy Array by Column Indices

However, the question aimed at uni-dimensional arrays. There, the most voted answer works well for that. There is a second answer that attempts to solve the problem also for multidimensional arrays but doesn't work - i.e. it does not return, for each row/column the index of the maximum value with ties solved randomly.

What would be the most performent way to do that, since I am working with big arrays?

user3483203
  • 50,081
  • 9
  • 65
  • 94
blipblop
  • 155
  • 1
  • 12
  • The linked SO doesn't seem relevant. If you want 'most performent', you need to first give us a working example. In order to claim my answer is better, I have to show that it gets the correct value, and runs faster. To do that I'd rather not make up my own example(s) and base method. – hpaulj Aug 19 '18 at 06:08

3 Answers3

5

A simple way is to add a small random number to all the values at the start, so your data would be like this:

a = np.array([[1.1827,3.1734,3.9187],[4.8172,5.7101,6.9182],[7.1834,8.5012,1.9818]])

That can be done by a = a + np.random.random(a.shape).

If you later need to get the original values back, you can do a.astype(int) to drop the fractional parts.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • 2
    You would have to guarantee that the number added is less than the smallest difference between a maximum number and the next largest (if the inputs are integers this works fine). Still, this is a clever answer. – user3483203 Aug 19 '18 at 06:14
3

Generic case solution to pick one per group

To solve a general case of picking a random number from a list/array of numbers that specify the ranges for the picks, we would use a trick of creating a uniform rand array, add offset specified by the interval lengths and then perform argsort. The implementation would look something like this -

def random_num_per_grp(L):
    # For each element in L pick a random number within range specified by it
    r1 = np.random.rand(np.sum(L)) + np.repeat(np.arange(len(L)),L)
    offset = np.r_[0,np.cumsum(L[:-1])]
    return r1.argsort()[offset] - offset

Sample case -

In [217]: L = [5,4,2]

In [218]: random_num_per_grp(L) # i.e. select one per [0-5,0-4,0-2]
Out[218]: array([2, 0, 1])

So, the output would have same number of elements as in input L and the first output element would be in [0,5), second in [0,4) and so on.


Solving our problem here

To solve our case here, we would use a modified version (specifically remove the offset removal part at the end of the func, like so -

def random_num_per_grp_cumsumed(L):
    # For each element in L pick a random number within range specified by it
    # The final output would be a cumsumed one for use with indexing, etc.
    r1 = np.random.rand(np.sum(L)) + np.repeat(np.arange(len(L)),L)
    offset = np.r_[0,np.cumsum(L[:-1])]
    return r1.argsort()[offset] 

Approach #1

One solution could use it like so -

def argmax_per_row_randtie(a):
    max_mask = a==a.max(1,keepdims=1)
    m,n = a.shape
    all_argmax_idx = np.flatnonzero(max_mask)
    offset = np.arange(m)*n
    return all_argmax_idx[random_num_per_grp_cumsumed(max_mask.sum(1))] - offset

Verification

Let's test out on the given sample with a huge number of runs and count number of occurences for each index for each row

In [235]: a
Out[235]: 
array([[1, 3, 3],
       [4, 5, 6],
       [7, 8, 1]])

In [225]: all_out = np.array([argmax_per_row_randtie(a) for i in range(10000)])

# The first element (row=0) should have similar probabilities for 1 and 2
In [236]: (all_out[:,0]==1).mean()
Out[236]: 0.504

In [237]: (all_out[:,0]==2).mean()
Out[237]: 0.496

# The second element (row=1) should only have 2
In [238]: (all_out[:,1]==2).mean()
Out[238]: 1.0

# The third element (row=2) should only have 1
In [239]: (all_out[:,2]==1).mean()
Out[239]: 1.0

Approach #2 : Use masking for performance

We could make use of masking and hence avoid that flatnonzero with the intention of gaining performance as working with boolean arrays generally is. Also, we would generalize to cover both rows (axis=1) and columns(axis=0) to give ourselves a modified one, like so -

def argmax_randtie_masking_generic(a, axis=1): 
    max_mask = a==a.max(axis=axis,keepdims=True)
    m,n = a.shape
    L = max_mask.sum(axis=axis)
    set_mask = np.zeros(L.sum(), dtype=bool)
    select_idx = random_num_per_grp_cumsumed(L)
    set_mask[select_idx] = True
    if axis==0:
        max_mask.T[max_mask.T] = set_mask
    else:
        max_mask[max_mask] = set_mask
    return max_mask.argmax(axis=axis) 

Sample runs on axis=0 and axis=1 -

In [423]: a
Out[423]: 
array([[1, 3, 3],
       [4, 5, 6],
       [7, 8, 1]])
In [424]: argmax_randtie_masking_generic(a, axis=1)
Out[424]: array([1, 2, 1])

In [425]: argmax_randtie_masking_generic(a, axis=1)
Out[425]: array([2, 2, 1])

In [426]: a[1,1] = 8

In [427]: a
Out[427]: 
array([[1, 3, 3],
       [4, 8, 6],
       [7, 8, 1]])

In [428]: argmax_randtie_masking_generic(a, axis=0)
Out[428]: array([2, 1, 1])

In [429]: argmax_randtie_masking_generic(a, axis=0)
Out[429]: array([2, 1, 1])

In [430]: argmax_randtie_masking_generic(a, axis=0)
Out[430]: array([2, 2, 1])
Community
  • 1
  • 1
Divakar
  • 218,885
  • 19
  • 262
  • 358
1

You could use an array of random numbers, the same shape as your input, but mask out the array to only leave the candidates for selection.

import numpy as np

def rndArgMax(a, axis):
    a_max = a.max(axis, keepdims=True)
    tmp = np.random.random(a.shape) * (a == a_max)
    return tmp.argmax(axis)

a = np.random.randint(0, 3, size=(2, 3, 4))
print(rndArgMax(a, 1))
# array([[1, 1, 2, 1],
#        [0, 1, 1, 1]])
Bi Rico
  • 25,283
  • 3
  • 52
  • 75