Count "greater" occurrences without loop

Question

Let's assume that we have an array A if shape (100,) and B of shape (10,). Both contain values in [0,1].

How do we get the count of elements in A greater than each value in B? I expect an of shape (10,), where the first element is "how many in A are greater than B[0]", the second is "how many in A are greater than B[1]", etc ...

Without using loops.

I tried the following, but it didn't work :

import numpy as np
import numpy.random as rdm

A = rdm.rand(100)
B = np.linspace(0,1,10)
def occ(z: float) ->float:
     return np.count_nonzero(A > z)
occ(B)

Python won't use my function as a scalar function on B, that's why I get:

operands could not be broadcast together with shapes (10,) (100,)

I've also tried with np.greater but I've got the same issue ...

Mad Physicist · Accepted Answer · 2020-09-15T16:03:48.777

Slow But Simple

The error message is cryptic if you don't understand it, but it's telling you what to do. Array dimensions are broadcast together by lining them up starting with the right edge. This is especially helpful if you split your operation into two parts:

Create a (100, 10) mask showing which elements of A are greater than which elements of B:
```
 mask = A[:, None] > B
```

Sum the result of the previous operation along the axis corresponding to A:

 result = np.count_nonzero(mask, axis=0)

OR

 result = np.sum(mask, axis=0)

This can be written as a one-liner:

(A[:, None] > B).sum(0)

OR

np.count_nonzero(A[:, None] > B, axis=0)

You can switch the dimensions and place B in the first axis to get the same result:

(A > B[:, None]).sum(1)

Fast and Elegant

Taking a totally different (but likely much more efficient) approach, you can use np.searchsorted:

A.sort()
result = A.size - np.searchsorted(A, B)

By default, searchsorted returns the left-index that each element of B would be inserted into A at. That pretty much immediately tells you how many elements of A are greater than that.

Benchmarks

Here, the algos are labeled as follows:

B0: (A[:, None] > B).sum(0)
B1: (A > B[:, None]).sum(1)
HH: np.cumsum(np.histogram(A, bins=B)[0][::-1])[::-1]
SS: A.sort(); A.size - np.searchsorted(A, B)

+--------+--------+----------------------------------------+
| A.size | B.size |        Time (B0 / B1 / HH / SS)        |
+--------+--------+----------------------------------------+
|    100 |     10 |  20.9 µs / 15.7 µs / 68.3 µs / 8.87 µs |
+--------+--------+----------------------------------------+
|   1000 |     10 |   118 µs / 57.2 µs /  139 µs / 17.8 µs |
+--------+--------+----------------------------------------+
|  10000 |     10 |   987 µs /  288 µs / 1.23 ms /  131 µs |
+--------+--------+----------------------------------------+
| 100000 |     10 |  9.48 ms / 2.77 ms / 13.4 ms / 1.42 ms |
+--------+--------+----------------------------------------+
|    100 |    100 |  70.7 µs / 63.8 µs /   71 µs / 11.4 µs |
+--------+--------+----------------------------------------+
|   1000 |    100 |   518 µs /  388 µs /  148 µs / 21.6 µs |
+--------+--------+----------------------------------------+
|  10000 |    100 |  4.91 ms / 2.67 ms / 1.22 ms /  137 µs |
+--------+--------+----------------------------------------+
| 100000 |    100 |  57.4 ms / 35.6 ms / 13.5 ms / 1.42 ms |
+--------+--------+----------------------------------------+

Memory layout matters. B1 is always faster than B0. This happens because summing contiguous (cached) elements (along the last axis in C-order) is always faster than having to skip across rows to get the next element. Broadcasting performs well for small values of B. Keep in mind that both the time and space complexity for B0 and B1 is O(A.size * B.size). The complexity of the two histogramming solutions should be about O(A.size * log(A.size)), but SS is implemented much more efficiently than HH because it can assume more things about the data.

Works perfect now. Is there a gain in performance using this instead of loop ? Would you use this solution or another ? — Sleep Water Tea, Sep 15 '20 at 15:14
Ha, I had not noticed who wrote this answer :-) The fast approach is probably the best answer for this particular question. However, it is important to understand broadcasting in general, as it applies to many other cases. Except that sort operations can be VERY slow if the arrays are large. — Javier Gonzalez, Sep 15 '20 at 15:19
@JavierGonzalez. That's exactly why I started with a preamble about the simple approach :) — Mad Physicist, Sep 15 '20 at 15:27
Right. I was just pointing it out. One thing: the first approach is faster in my machine if the length of A is larger than about 1000. As size increases it scales better. — Javier Gonzalez, Sep 15 '20 at 15:40
@JavierGonzalez. I dunno. `searchsorted` is always faster on my machine, by a lot. I posted some benchmarks. — Mad Physicist, Sep 15 '20 at 16:05
@SleepWaterTea. I added some benchmarks to show why I would always use the `searchsorted` approach — Mad Physicist, Sep 15 '20 at 16:05
Nice benchmark table. To add to your estimates of complexity: The complexity of `HH` partly comes from the sorting, which you can realistically expect it to be `O(A.size * log(A.size))`, so it would be `O(A.size * log(A.size) * B.size)` — Javier Gonzalez, Sep 15 '20 at 16:28
@Javier. The elements of B must already be sorted, so it's more like `O(A.size * log(B.size))`: each element of A gets a binary search into B. Searchsorted is faster because it does the opposite: placing each element of B into A is `O(B.size * log(A.size))` sorting A is `O(A.size * log(A.size))`. The two are comparable, but sort and searchsorted just have much less overhead than making a histogram — Mad Physicist, Sep 15 '20 at 16:41
Sorry, in the previous comment I meant `SS` (which you did not explicitly add in your answer). The complexity of `SS` is `O((A.size + B.size) * log(A.size) )`. It follows exactly from your last comment. Corrected it. — Javier Gonzalez, Sep 15 '20 at 16:47
@Javier. Agreed. I didn't list it explicitly since if `A.size > B.size`, `O(A.size + B.size) = O(2 * A.size) = O(A.size)` — Mad Physicist, Sep 15 '20 at 17:08

Michael Szczesny · Answer 2 · 2020-09-15T15:05:30.650

0

I think you can use np.histogram for this job

A = rdm.rand(100)
B = np.linspace(0,1,10)
np.histogram(A, bins=B)[0]

Gives the output

array([10,  9,  8, 11,  9, 14, 10, 12, 17])

B[9] will always be empty because there are no values > 1.

And compute the cumsum backwards

np.cumsum(np.histogram(A, bins=B)[0][::-1])[::-1]

Output

array([100,  90,  81,  73,  62,  53,  39,  29,  17])

edited Sep 15 '20 at 15:05

answered Sep 15 '20 at 14:50

Michael Szczesny

4,911
5
15
32

You have to be careful here, because `B` is interpreted as bin edges. Make sure you append `A.max() + 1` or similar to `B`, and don't forget to take the cumulative sum at the end. Aside from that, this is a clever solution. – Mad Physicist Sep 15 '20 at 14:58

Javier Gonzalez · Answer 3 · 2020-09-15T15:12:30.727

np.sum(A>B.reshape((-1,1)), axis=1)

Explanation

Need to understand broadcasting and reshaping for this. By reshaping B to shape (len(B), 1), it can be broadcasted with A to produce an array with shape (len(B), len(A)) containing all comparisons. Then you sum over axis 1 (along A).

In other words, A < B does not work because A has 100 entries, and B has 10. If you read the broadcasting rules, you will see that numpy will start with the last dimension, and if they are the same size, then it can compare one-to-one. If one of the two is 1, then this dimension is stretched or “copied” to match the other. If they are not equal and none of them is equal to 1, it fails.

With a shorter example:

A = np.array([0.5112744 , 0.21696187, 0.14710105, 0.98581087, 0.50053359,
              0.54954654, 0.81217522, 0.50009166, 0.42990167, 0.56078499])
B = array([0.25, 0.5 , 0.75])

the transpose of (A>B.reshape((-1,1))) (for readability)

np.array([[ True,  True, False],
          [False, False, False],
          [False, False, False],
          [ True,  True,  True],
          [ True,  True, False],
          [ True,  True, False],
          [ True,  True,  True],
          [ True,  True, False],
          [ True, False, False],
          [ True,  True, False]])

and np.sum(A>B.reshape((-1,1)), axis=1) is

array([8, 7, 2])

There's a typo in your prose. -1 -> 1 – Mad Physicist Sep 15 '20 at 15:04 — Mad Physicist, Sep 15 '20 at 15:04

Count "greater" occurrences without loop

3 Answers3