13

Say you have a numpy vector [0,3,1,1,1] and you run argsort you will get [0,2,3,4,1] but all the ones are the same! What I want is an efficient way to shuffle indices of identical values. Any idea how to do that without a while loop with two indices on the sorted vector?

numpy.array([0,3,1,1,1]).argsort()
abcd
  • 10,215
  • 15
  • 51
  • 85
Hanan Shteingart
  • 8,480
  • 10
  • 53
  • 66

2 Answers2

14

Use lexsort: np.lexsort((b,a)) means Sort by a, then by b

>>> a
array([0, 3, 1, 1, 1])
>>> b=np.random.random(a.size)
>>> b
array([ 0.00673736,  0.90089115,  0.31407214,  0.24299867,  0.7223546 ])
>>> np.lexsort((b,a))
array([0, 3, 2, 4, 1])
>>> a.argsort()
array([0, 2, 3, 4, 1])
>>> a[[0, 3, 2, 4, 1]]
array([0, 1, 1, 1, 3])
>>> a[[0, 2, 3, 4, 1]]
array([0, 1, 1, 1, 3])
CT Zhu
  • 52,648
  • 17
  • 120
  • 133
4

This is a bit of a hack, but if your array contains integers only you could add random values and argsort the result. np.random.rand gives you results in [0, 1) so in this case you're guaranteed to maintain the order for non-identical elements.

>>> import numpy as np
>>> arr = np.array([0,3,1,1,1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 3, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 3, 4, 2, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 2, 3, 4, 1])
>>> np.argsort(arr + np.random.rand(*arr.shape))
array([0, 4, 2, 3, 1])

Here we see index 0 is always first in the argsort result and index 1 is last, but the rest of the results are in a random order.

In general you could generate random values bounded by np.diff(np.sort(arr)).max(), but you might run into precision issues at some point.

YXD
  • 31,741
  • 15
  • 75
  • 115
  • 1
    Isn't it too big an assumption to assume the vectors contains only integers? – CT Zhu Nov 25 '13 at 17:41
  • Well, I don't I know. I was going on what was given in the question and it is acknowledged in my answer. Your solution is neat though. – YXD Nov 26 '13 at 08:26