2

I have a python NxN numpy pair-wise array (matrix) of double values. Each array element of e.g., (i,j), is a measurement between the i and j item. The diagonal, where i==j, is 1 as it's a pairwise measurement of itself. This also means that the 2D NxN numpy array can be represented in matrix triangular form (one half of the numpy array identical to the other half across the diagonal).

A truncated representation:

[[1.         0.11428571 0.04615385 ... 0.13888889 0.07954545 0.05494505]
 [0.11428571 1.         0.09836066 ... 0.06578947 0.09302326 0.07954545]
 [0.04615385 0.09836066 1.         ... 0.07843137 0.09821429 0.11711712]
 ...
 [0.13888889 0.06578947 0.07843137 ... 1.         0.34313725 0.31428571]
 [0.07954545 0.09302326 0.09821429 ... 0.34313725 1.         0.64130435]
 [0.05494505 0.07954545 0.11711712 ... 0.31428571 0.64130435 1.        ]]

I want to get out the smallest N values whilst not including the pairwise values twice, as would be the case due to the pair-wise duplication e.g., (5,6) == (6,5), and I do not want to include any of the identical diagonal values of 1 where i == j.

I understand that numpy has the partition method and I've seen plenty of examples for a flat array, but I'm struggling to find anything straightforward for a pair-wise comparison matrix.

EDIT #1 Based on my first response below I implemented:

seventyPercentInt: int = round((populationSizeInt/100)*70)

upperTriangleArray = dataArray[np.triu_indices(len(dataArray),1)]
seventyPercentArray = upperTriangleArray[np.argpartition(upperTriangleArray,seventyPercentInt)][0:seventyPercentInt]

print(len(np.unique(seventyPercentArray)))

The upperTriangleArray numpy array has 1133265 elements to pick the lowest k from. In this case k is represented by seventyPercentInt, which is around 1054 values. However, when I apply np.argpartition only the value of 0 is returned.

The flat array upperTriangleArray is reduced to a shape (1133265,).

SOLUTION

As per the first reply below (the accepted answer), my code that worked:

upperTriangleArray = dataArray[np.triu_indices(len(dataArray),1)]

seventyPercentInt: int = round((len(upperTriangleArray)/100)*70)

seventyPercentArray = upperTriangleArray[np.argpartition(upperTriangleArray,seventyPercentInt)][0:seventyPercentInt]

I ran into some slight trouble (my own making), with the seventyPercentInt. Rather than taking 70% of the pairwise elements, I took 70% of the elements to be compared. Two very different values.

Anthony Nash
  • 834
  • 1
  • 9
  • 26

1 Answers1

3

You can use np.triu_indices to keep only the values of the upper triangle.

Then you can use np.argpartition as in the example below.

import numpy as np

A = np.array([[1.0, 0.1, 0.2, 0.3],
            [0.1, 1.0, 0.4, 0.5],
            [0.2, 0.3, 1.0, 0.6],
            [0.3, 0.5, 0.4, 1.0]])

A_upper_triangle = A[np.triu_indices(len(A), 1)]

print(A_upper_triangle)
# return [0.1 0.2 0.3 0.3 0.5 0.4]

k=2

print(A_upper_triangle[np.argpartition(A_upper_triangle, k)][0:k])
#return [0.1 0.2]
baptou
  • 107
  • 6
  • Thanks. I've edited my original post so you can see what I tried implementing based on your answer. On using np.argpartition, I'm only getting *0* as an answer, despite there being plenty of values to pick the lowest *k*. – Anthony Nash May 27 '21 at 09:56
  • can you try print(np.sort(upperTriangleArray)[0:seventyPercentInt]) ? – baptou May 27 '21 at 10:11
  • It returns a list of 0. I've just copied your code, including the example array *A*, and it works. It just fails when I use my own array. – Anthony Nash May 27 '21 at 10:15
  • I think I know the issue... it concerns my seventyPercentInt value. I will test some code, make a further edit, and hopefully have a solution. If it works, then your answer with be "accepted". – Anthony Nash May 27 '21 at 10:25