change numpy array to start from zero

Question

I am trying to figure out a way to take a numpy array of integers, then change the entries such that the smallest is 0, the second smallest is 1, etc.

E.g.

Start with this

In [13]: a = numpy.array([[1, 2, 10],[1, 2, 99]])

In [14]: a
Out[14]: 
array([[ 1,  2, 10],
       [ 1,  2, 99]])

And get this:

array([[ 0,  1, 2],
       [ 0,  1, 3]])

I can start to see the way through with numpy.unique(), e.g.

In [19]: range(len(b))
Out[19]: [0, 1, 2, 3]

In [20]: b = numpy.unique(a)

In [21]: b
Out[21]: array([ 1,  2, 10, 99])

In [22]: c = range(len(b))

In [23]: c
Out[23]: [0, 1, 2, 3]

Seems like I should now be able to use b and c to translate from one array to the other. But what's the best (and quickest) way to do this?

possible duplicate of [Ranking of numpy array with possible duplicates](http://stackoverflow.com/questions/14671013/ranking-of-numpy-array-with-possible-duplicates) — YXD, Jan 23 '14 at 00:51

score 5 · Accepted Answer · answered Jan 23 '14 at 00:40

5

Don't know about quickest, but if you have scipy available, you can use scipy.stats.rankdata:

>>> a = np.array([[1, 2, 10],[1, 2, 99]])
>>> scipy.stats.rankdata(a,'dense').reshape(a.shape)-1
array([[ 0.,  1.,  2.],
       [ 0.,  1.,  3.]])

(The reshape is needed because it flattens the data first, and the -1 because it starts its ranks at 1.)

answered Jan 23 '14 at 00:40

DSM

342,061
65
592
494

Thanks, and thanks particularly for the added explanations. Very neat solution. – roblanf Jan 23 '14 at 21:07

Tal Darom · Answer 2 · 2014-01-23T01:16:41.347

2

the most straight forward way is using argsort()

a = numpy.array([0, 1, 1, 2])
u, ind = numpy.unique(a, return_inverse = True)
u = u.argsort().argsort()
ret = u[ind]

edited Jan 23 '14 at 01:16

answered Jan 23 '14 at 00:46

Tal Darom

1,379
1
8
26

1

That won't work if there are repeated indices, e.g. `[0, 1, 1, 2]`. – DSM Jan 23 '14 at 00:50
notice that `np.unique` always returns sorted values so `u.argsort().argsort() == np.arange(len(u))`. – Bi Rico Jan 23 '14 at 03:44
@BiRico is right, so you can just use `ind` and drop the last two lines. – askewchan May 02 '14 at 02:43

score 2 · Answer 3 · answered Jan 23 '14 at 03:37

I'll give you two choices, the first seems cleaner somehow:

a = numpy.array([[1, 2, 10],[1, 2, 99]])
uniq, inv = numpy.unique(a, return_inverse=True)
result = inv.reshape(a.shape)

I like this one because it works with older versions of numpy that don't have return_inverse:

a = numpy.array([[1, 2, 10],[1, 2, 99]])
uniq = numpy.unique(a)
result = uniq.searchsorted(a)

change numpy array to start from zero

3 Answers3