0

In most of the metric learning task at some point we have similarity matrix KxM. Where K is number of new samples and M number of database samples.

From each row of this matrix we need to choose only N samples with largest similarity value, where N << M.

Typical way to do so in Python is:

def get_args_of_best_score(score_matrix, N):
    sorted_arg_matrix = np.argsort(score_matrix, axis=1)[:, :-N-1:-1]
    return sorted_arg_matrix

This will give us a matrix of size KxN, with positions of max score elements for each row. Values sorted by score.

But for very large matrices like M, M > 10000 this could works very slow. Is there any good way to speed it up?

ZFTurbo
  • 3,652
  • 3
  • 22
  • 27

1 Answers1

0

It can be done with following function:

def get_args_of_best_score_fast(score_matrix, N):
    # Get top but unsorted arguments
    arg_part = np.argpartition(-score_matrix, N, axis=1)[:, :N]
    # https://stackoverflow.com/questions/26322232/how-to-apply-the-output-of-numpy-argpartition-for-2-d-arrays
    v_part = score_matrix[np.arange(score_matrix.shape[0])[:, None], arg_part]

    # Now sort small arrays
    sorted_args = np.argsort(v_part, axis=1)[:, ::-1]
    sorted_arg_matrix = arg_part[np.arange(arg_part.shape[0])[:, None], sorted_args]
    return sorted_arg_matrix
ZFTurbo
  • 3,652
  • 3
  • 22
  • 27