9

I have a largish 2d numpy array, and I want to extract the lowest 10 elements of each row as well as their indexes. Since my array is largish, I would prefer not to sort the whole array.

I heard about the argpartition() function, with which I can get the indexes of the lowest 10 elements:

top10indexes = np.argpartition(myBigArray,10)[:,:10]

Note that argpartition() partitions axis -1 by default, which is what I want. The result here has the same shape as myBigArray containing indexes into the respective rows such that the first 10 indexes point to the 10 lowest values.

How can I now extract the elements of myBigArray corresponding to those indexes?

Obvious fancy indexing like myBigArray[top10indexes] or myBigArray[:,top10indexes] do something quite different. I could also use list comprehensions, something like:

array([row[idxs] for row,idxs in zip(myBigArray,top10indexes)])

but that would incur a performance hit iterating numpy rows and converting the result back to an array.

nb: I could just use np.partition() to get the values, and they may even correspond to the indexes (or may not..), but I don't want to do the partition twice if I can avoid it.

Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
drevicko
  • 14,382
  • 15
  • 75
  • 97

1 Answers1

12

You can avoid using the flattened copies and the need to extract all the values by doing:

num = 10
top = np.argpartition(myBigArray, num, axis=1)[:, :num]
myBigArray[np.arange(myBigArray.shape[0])[:, None], top]

For NumPy >= 1.9.0 this will be very efficient and comparable to np.take().

drevicko
  • 14,382
  • 15
  • 75
  • 97
Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
  • 2
    I deleted my answer using `flatten()`. I worked out why it didn't work, but couldn't see any easy way to fix it without effectively making a more convoluted version of yours! – three_pineapples Oct 12 '14 at 12:20
  • 1
    gr8! I also learned that `None` plays the same role here as `newaxis`:) btw, `arr` in your answer should be `myBigArray` in case my edit is not accepted.. – drevicko Oct 12 '14 at 22:26