3

I am trying to figure out a way to take a numpy array of integers, then change the entries such that the smallest is 0, the second smallest is 1, etc.

E.g.

Start with this

In [13]: a = numpy.array([[1, 2, 10],[1, 2, 99]])

In [14]: a
Out[14]: 
array([[ 1,  2, 10],
       [ 1,  2, 99]])

And get this:

array([[ 0,  1, 2],
       [ 0,  1, 3]])

I can start to see the way through with numpy.unique(), e.g.

In [19]: range(len(b))
Out[19]: [0, 1, 2, 3]

In [20]: b = numpy.unique(a)

In [21]: b
Out[21]: array([ 1,  2, 10, 99])

In [22]: c = range(len(b))

In [23]: c
Out[23]: [0, 1, 2, 3]

Seems like I should now be able to use b and c to translate from one array to the other. But what's the best (and quickest) way to do this?

roblanf
  • 1,741
  • 3
  • 18
  • 24
  • 1
    possible duplicate of [Ranking of numpy array with possible duplicates](http://stackoverflow.com/questions/14671013/ranking-of-numpy-array-with-possible-duplicates) – YXD Jan 23 '14 at 00:51

3 Answers3

5

Don't know about quickest, but if you have scipy available, you can use scipy.stats.rankdata:

>>> a = np.array([[1, 2, 10],[1, 2, 99]])
>>> scipy.stats.rankdata(a,'dense').reshape(a.shape)-1
array([[ 0.,  1.,  2.],
       [ 0.,  1.,  3.]])

(The reshape is needed because it flattens the data first, and the -1 because it starts its ranks at 1.)

DSM
  • 342,061
  • 65
  • 592
  • 494
2

the most straight forward way is using argsort()

a = numpy.array([0, 1, 1, 2])
u, ind = numpy.unique(a, return_inverse = True)
u = u.argsort().argsort()
ret = u[ind]
Tal Darom
  • 1,379
  • 1
  • 8
  • 26
2

I'll give you two choices, the first seems cleaner somehow:

a = numpy.array([[1, 2, 10],[1, 2, 99]])
uniq, inv = numpy.unique(a, return_inverse=True)
result = inv.reshape(a.shape)

I like this one because it works with older versions of numpy that don't have return_inverse:

a = numpy.array([[1, 2, 10],[1, 2, 99]])
uniq = numpy.unique(a)
result = uniq.searchsorted(a)
Bi Rico
  • 25,283
  • 3
  • 52
  • 75