Transform a set of numbers in numpy so that each number gets converted into a number of other numbers which are less than it

Question

Consider a set of numbers:

In [8]: import numpy as np

In [9]: x = np.array([np.random.random() for i in range(10)])

In [10]: x
Out[10]: 
array([ 0.62594394,  0.03255799,  0.7768568 ,  0.03050498,  0.01951657,
        0.04767246,  0.68038553,  0.60036203,  0.3617409 ,  0.80294355])

Now I want to transform this set into another set y in the following way: for every element i in x, the corresponding element j in y would be the number of other elements in x which are less than i. For example, the above given x would look like:

In [25]: y
Out[25]: array([ 6.,  2.,  8.,  1.,  0.,  3.,  7.,  5.,  4.,  9.])

Now, I can do this using simple python loops:

In [16]: for i in range(len(x)):
    ...:     tot = 0
    ...:     for j in range(len(x)):
    ...:         if x[i] > x[j]: tot += 1
    ...:     y[i] = int(tot)

However, when length of x is very large, the code becomes extremely slow. I was wondering if any numpy magic can be brought to rescue. For example, if I had to filter all the elements less than 0.5, I would have simply used a Boolean masking:

In [19]: z = x[x < 0.5]

In [20]: z
Out[20]: array([ 0.03255799,  0.03050498,  0.01951657,  0.04767246,  0.3617409 ])

Can something like this be used so that the same thing could be achieved much faster?

Try `x = np.random.rand(10)` and you'll see that you don't have to call `random()` in a list comp:) — Andras Deak -- Слава Україні, Dec 20 '16 at 12:19

score 12 · Accepted Answer · edited May 23 '17 at 12:34

12

What you actually need to do is get the inverse of the sorting order of your array:

import numpy as np
x = np.random.rand(10)
y = np.empty(x.size,dtype=np.int64)
y[x.argsort()] = np.arange(x.size)

Example run (in ipython):

In [367]: x
Out[367]: 
array([ 0.09139335,  0.29084225,  0.43560987,  0.92334644,  0.09868977,
        0.90202354,  0.80905083,  0.4801967 ,  0.99086213,  0.00933582])

In [368]: y
Out[368]: array([1, 3, 4, 8, 2, 7, 6, 5, 9, 0])

Alternatively, if you want to get the number of elements greater than each corresponding element in x, you have to reverse the sorting from ascending to descending. One possible option to do this is to simply swap the construction of the indexing:

y_rev = np.empty(x.size,dtype=np.int64)
y_rev[x.argsort()] = np.arange(x.size)[::-1]

another, as @unutbu suggested in a comment, is to map the original array to the new one:

y_rev = x.size - y - 1

edited May 23 '17 at 12:34

Community

1
1

answered Dec 20 '16 at 12:22

Andras Deak -- Слава Україні

33,737
11
83
111

1

Or just argsort two times : `x.argsort().argsort()`? – Divakar Dec 20 '16 at 12:25
@Divakar naturally, you're right. But I've never seen this approach:) Would you like to add it as an answer? – Andras Deak -- Слава Україні Dec 20 '16 at 12:27
1

@Divakar: `y[np.argsort(x)] = np.arange(x.size)` is faster – unutbu Dec 20 '16 at 12:28
@AndrasDeak: To get the desired `y`, the OP would need `y = x.size-y-1`. – unutbu Dec 20 '16 at 12:29
@unutbu I'm still thinking along the lines of your now-deleted comment on the question. Doesn't OP's example output contradict their specification? – Andras Deak -- Слава Україні Dec 20 '16 at 12:33
1

@AndrasDeak: Or, the OP could keep the example, but change the text to read, "y would be the number of other elements in x which are **greater** than i". – unutbu Dec 20 '16 at 12:34
@unutbu Look at the last element of `x` in my example: it's the smallest value in `x`, so 0 elements are less than it. Correspondingly, `y` has 0 in that position. Where am I going wrong? – Andras Deak -- Слава Україні Dec 20 '16 at 12:35
1

@AndrasDeak: I think your answer is correct. It is consistent with the text of the OP's question, and his `for-loop` code. It is the example that should be changed. – unutbu Dec 20 '16 at 12:40
@unutbu ah OK, thanks for the clarification. I added two versions to do this reverse order (one is yours). – Andras Deak -- Слава Україні Dec 20 '16 at 12:40
1

@Divakar just referred me to this post from http://stackoverflow.com/a/41394980/2336654. Brilliant method. – piRSquared Dec 30 '16 at 11:03
@piRSquared thank you (I'll have to thank him for the publicity;) ) – Andras Deak -- Слава Україні Dec 30 '16 at 12:50
I added some timings if you're interested. – piRSquared Jan 09 '17 at 21:35

score 5 · Answer 2 · edited May 23 '17 at 11:33

5

Here's one approach using np.searchsorted -

np.searchsorted(np.sort(x),x)

Another one mostly based on @Andras Deak's solution using argsort() -

x.argsort().argsort()

Sample run -

In [359]: x
Out[359]: 
array([ 0.62594394,  0.03255799,  0.7768568 ,  0.03050498,  0.01951657,
        0.04767246,  0.68038553,  0.60036203,  0.3617409 ,  0.80294355])

In [360]: np.searchsorted(np.sort(x),x)
Out[360]: array([6, 2, 8, 1, 0, 3, 7, 5, 4, 9])

In [361]: x.argsort().argsort()
Out[361]: array([6, 2, 8, 1, 0, 3, 7, 5, 4, 9])

edited May 23 '17 at 11:33

Community

1
1

answered Dec 20 '16 at 12:28

Divakar

218,885
19
262
358

I added some timings if you're interested. – piRSquared Jan 09 '17 at 21:35

Michele · Answer 3 · 2016-12-20T12:46:50.983

2

In addition to the other answers another solution using boolean indexing could be:

sum(x > i for i in x)

For your example:

In [10]: x
Out[10]: 
array([ 0.62594394,  0.03255799,  0.7768568 ,  0.03050498,  0.01951657,
        0.04767246,  0.68038553,  0.60036203,  0.3617409 ,  0.80294355])

In [10]: y = sum(x > i for i in x)
In [11]: y
Out[10]: array([6, 2, 8, 1, 0, 3, 7, 5, 4, 9])

edited Dec 20 '16 at 12:46

answered Dec 20 '16 at 12:45

Michele

2,796
2
21
29

1

In a vectorized way : `(x[:,None] > x).sum(1)`. – Divakar Dec 20 '16 at 12:46

score 2 · Answer 4 · answered Jan 09 '17 at 21:34

I wanted to contribute to this post by providing some testing on @Andras Deak's solution versus argsort again.

It would appear that argsort again is quicker for short arrays. Simple idea is to evaluate what is the length of array in which we see the balance shift.

I'll define three functions

construct which is Andras Deak's solution
argsortagain which is obvious
attempted_optimal which trades off at len(a) == 400

functions

def argsortagain(s):
    return s.argsort()

def construct(s):
    u = np.empty(s.size, dtype=np.int64)
    u[s] = np.arange(s.size)

    return u

def attempted_optimal(s):
    return argsortagain(s) if len(s) < 400 else construct(s)

testing

results = pd.DataFrame(
    index=pd.RangeIndex(10, 610, 10, 'len'),
    columns=pd.Index(['construct', 'argsortagain', 'attempted_optimal'], name='function'))

for i in results.index:
    a = np.random.rand(i)
    s = a.argsort()
    for j in results.columns:
        results.set_value(
            i, j,
            timeit(
                '{}(s)'.format(j),
                'from __main__ import {}, s'.format(j),
                number=10000)
        )

results.plot()

conclusion

attempted_optimal does what its supposed to do. But I'm not sure it's worth it for the marginal benefit gained in a spectrum of array length (sub 400) where it hardly matters. I'd advocate fully for constructed only.

This analysis helped me reach this conclusion.

Transform a set of numbers in numpy so that each number gets converted into a number of other numbers which are less than it

4 Answers4

functions

testing

conclusion

Linked

Related