3

For each column in a 2D NumPy array, the column's maximum value can appear more than once. I would like to find the row index for each column maximum, without repeating row indices.

Here is an example that demonstrates why np.argmax doesn't work:

import numpy as np

a = np.array([[1, 1, 0],
              [1, 0, 1],
              [0, 0, 1]])

ind = np.argmax(a, axis=0)

print(ind)

Output:

[0 0 2]

I want the result: [1, 0, 2] for this example.

That is:

  • The row index for the second column must be 0
  • This implies that the row index for the first column must be 1
  • This in turn implies that the row index for the third column must be 2

A slightly more complex example is this array:

a = np.array([[1, 1, 0],
              [1, 1, 1],
              [0, 0, 1]])

In this case, there is no column with a unique maximum value. I'd be happy with either of these answers:

  • [0, 1, 2]
  • [1, 0, 2]

An even more complex example is:

a = np.array([[1, 1, 1],
              [1, 1, 1],
              [0, 1, 1]])

In this case, I'd be happy with any of these answers:

  • [0, 1, 2]
  • [0, 2, 1]
  • [1, 0, 2]
  • [1, 2, 0]

I can solve these problems with loops and logical conditions, but I'm wondering if there is a way to solve the problem using numpy functions?

ToddP
  • 652
  • 13
  • 18
  • Is this max value always 1? – amzon-ex Jun 25 '20 at 04:02
  • No, the max value can be anything, and there can also be a different max value in each column. But for now I'd be happy with a solution for the cases above. – ToddP Jun 25 '20 at 04:04
  • is there a guarantee that such structure of maximum locations exists (meaning there are row indices in each column carrying its max that can be different from other columns)? – Ehsan Jun 25 '20 at 04:30
  • Yes, there is this guarantee, like in the cases above. – ToddP Jun 25 '20 at 04:46
  • @ToddP If your matrix is not too big (or maximums in each array is not repeated too many times), the suggested answer below might help. It works for different maximums in various columns as well. – Ehsan Jun 25 '20 at 09:05

2 Answers2

5

May be overkill, but you can use scipy.optimize.linear_sum_assignment:

from scipy.optimize import linear_sum_assignment

a = np.array([[1, 1, 0],
              [1, 0, 1],
              [0, 0, 1]])

linear_sum_assignment(-a.T)[1]
# array([1, 0, 2])

Note that you can always reduce to the 0,1 case using something like

abin = a==a.max(axis=0)

This can speed up the assignment quite a bit.

Alternatively, see this post for a graph theory solution.

Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
  • 1
    Amazing answer. Upvoted. Would take your answers over mine anytime :) Originally posted my question to see if there is a better answer for this, but seems this is even a better answer than that. – Ehsan Jun 26 '20 at 00:24
1

Inspired by the solution suggested here:

import numpy_indexed as npi
ind = np.argwhere(a == a.max(0))
l = np.array(npi.group_by(ind[:,1]).split(ind[:, 0]))
def pick_one(a, index, buffer, visited):
    if index == len(a):
        return True
    for item in a[index]:
        if item not in visited:
            buffer.append(item)
            visited.add(item)
            if pick_one(a, index + 1, buffer, visited):
                return True
            buffer.pop()
            visited.remove(item)
    return False


buffer = []
pick_one(l, 0, buffer, set())
print(buffer)

example:

a = np.array([[1, 1, 0],
              [1, 0, 1],
              [0, 0, 1]])

output:

[1, 0, 2]
Ehsan
  • 12,072
  • 2
  • 20
  • 33
  • This is nice, but please see the end of my original post: I'm looking for a solution that doesn't require looping and logical checks. – ToddP Jun 25 '20 at 21:04
  • I see your point. While I do not know the backend of scipy package in solution offered by @Paul above, I can bet it is faster than my answer. Good luck. – Ehsan Jun 26 '20 at 00:23
  • @ToddP Also, I would guess it would be faster to use Paul's answer to my post https://stackoverflow.com/questions/62571292/find-a-list-of-unique-representatives-elements-from-a-list-of-arrays if you care more about performance and not the readability of code. Feel free to check it. `l` in my answer would be the array to feed to that post's answer. Please let me know if you need help with merging two answers. – Ehsan Jun 26 '20 at 00:29