Faster for loop with only if in python

Question

I'm dealing with a big dataset and want to basically this:

test = np.random.rand(int(1e7))-0.5
def test0(test):
    return [0 if c<0 else c for c in test]

which is doing this:

def test1(test):
    for i,dat in enumerate(test):
        if dat<0: 
            test[i] = 0
        else:
            test[i] = dat
    return test

Is there a way to modify test0 to skip the else request so i works like this:

def test1(test):
    for i,dat in enumerate(test):
        if dat<0: test[i] = 0
    return test

Thanks in advance!

Or https://stackoverflow.com/questions/10335090/replace-negative-values-in-an-numpy-array — mkrieger1, Dec 22 '21 at 10:29
"which is doing this:" – not quite, `test0` returns a new list, `test1` modifies the input array. — mkrieger1, Dec 22 '21 at 10:30

Antony Hatchkins · Answer 1 · 2021-12-23T07:22:01.820

2

You could try

np.maximum(test, 0)

But where is the fastest on my machine:

https://gist.github.com/axil/af6c4adb8c5634ff39ed9f3da1efaa90

Actually it depends on the amount of negative values in the array:

https://gist.github.com/axil/ce4ecdf1cb0446db47b979c37ed5fba3

Results:
– where is the fastest in most cases and is the only one with the flat curve
– putmask is #2
– where is only faster than the others when there's almost nothing to be done (≤10%)
– maximum and clip are (surprisingly) slower than the others in the whole range and obviously share the implementation.

The size of the array generally does not matter: https://gist.github.com/axil/2241e62977f46753caac7005268d5b28

edited Dec 23 '21 at 07:22

answered Dec 22 '21 at 10:28

Antony Hatchkins

31,947
10
111
111

where is not fastest as it is still kind of if else but vectorized – Dariusz Krynicki Dec 22 '21 at 11:10
@DariuszKrynicki Run this gist on your computer and see it yourself ;) – Antony Hatchkins Dec 22 '21 at 11:14
your test is different. in each loop you execute each function once while I test performance of the function on many executions each. – Dariusz Krynicki Dec 22 '21 at 11:22
1

this is interesting. where is faster on small number of executions but slower on higher number of executions. – Dariusz Krynicki Dec 22 '21 at 11:27
@DariuszKrynicki It can't be the case. Python is not numba. Every line of code has a predefined number of bytecode operations. No matter how many of them you execute you'll have the same speed. Modern CPUs have their own quirks and optimizations with branch predictions, but I'd argue we haven't gone that low-level in this task. If you insist on this point I can plot time vs number of repetitions but I don't expect any surprises there. You can try it yourself if you wish. – Antony Hatchkins Dec 22 '21 at 16:26
Are those strong curves not due to branch prediction? – Kelly Bundy Dec 22 '21 at 19:37
@KellyBundy I think the non-flat curves are optimization artefacts: the expensive operation here is copying non-sequential elements, it is proportional to the amount of elements to copy. First they decide whether to start from `a.copy()` (when % of `True`s is low) or from `np.zeros` (when % is high) , then they add the required zeros or elements of `a`, respectively. `where` does not have this optimization, but uses some sort of low-level vector function. – Antony Hatchkins Dec 22 '21 at 20:58
@antony-hatchkins can you do a comparison of np.where vs X[X < 0] = 0 depending on the array lenght / shape? – Dariusz Krynicki Dec 22 '21 at 22:59
@DariuszKrynicki Done – Antony Hatchkins Dec 23 '21 at 07:22

Dariusz Krynicki · Accepted Answer · 2021-12-22T22:57:42.613

2

just do which seems to be fastest option for you:

(1) test[test < 0] = 0

(2) np.where(test < 0, 0, test) # THANKS TO @antony-hatchkins

(3) test.clip(0) # THANKS TO @u12-forward

depending on how you test it.

when you execute each method 1000 times then approach number 2 is fastest. when you measure single function execution then option number 1 is fastest.

test:

import numpy as np
import timeit
from copy import copy
from functools import partial


def create_data():
    return np.random.rand(int(1e7))-0.5


def func1(data):
    data[data < 0] = 0


def func2(data):
    np.putmask(data, data < 0, 0)


def func3(data):
    np.maximum(data, 0)


def func4(data):
    data.clip(0)


def func5(data):
    np.where(data < 0, 0, data)


if __name__ == '__main__':
    n_loops = 1000
    test = create_data()

    t1 = timeit.Timer(partial(func1, copy(test)))
    t2 = timeit.Timer(partial(func2, copy(test)))
    t3 = timeit.Timer(partial(func3, copy(test)))
    t4 = timeit.Timer(partial(func4, copy(test)))
    t5 = timeit.Timer(partial(func4, copy(test)))

    print(f"func1 (x[x < 0]): timeit {t1.timeit(n_loops)} num test loops {n_loops}")
    print(f"func2 (putmask): timeit {t2.timeit(n_loops)} num test loops {n_loops}")
    print(f"func3 (maximum): timeit {t3.timeit(n_loops)} num test loops {n_loops}")
    print(f"func4 (clip): timeit {t4.timeit(n_loops)} num test loops {n_loops}")
    print(f"func5 (where): timeit {t5.timeit(n_loops)} num test loops {n_loops}")

test results:

func1 (x[x < 0]): timeit 7.2177265440000005 num test loops 1000
func2 (putmask): timeit 13.913492435999999 num test loops 1000
func3 (maximum): timeit 23.065230873999997 num test loops 1000
func4 (clip): timeit 22.768682354000006 num test loops 1000
func5 (where): timeit 23.844607757999995 num test loops 1000

EDIT:

different approach to test data[data < 0] = 0 vs np.where(data < 0, 0, data):

import numpy as np
from time import perf_counter as clock


z = np.random.rand(10**7) - 0.5

start = clock()
for i in range(100):
    a = z.copy()
    np.where(a<0, 0, a)
print(clock() - start)


start = clock()
for i in range(100):
    a = z.copy()
    a[a<0] = 0
print(clock() - start)

test result:

7.9247566030000005
8.021165436000002

test3:

In [1]: import numpy as np
   ...: from copy import copy
   ...:
   ...:
   ...:
   ...: test = np.random.rand(int(1e7))-0.5
   ...:
   ...:
   ...: def func1():
   ...:     data = copy(test)
   ...:     data[data < 0] = 0
   ...:
   ...:
   ...: def func2():
   ...:     data = copy(test)
   ...:     np.putmask(data, data < 0, 0)
   ...:
   ...:
   ...: def func3():
   ...:     data = copy(test)
   ...:     np.maximum(data, 0)
   ...:
   ...:
   ...: def func4():
   ...:     data = copy(test)
   ...:     data.clip(0)
   ...:
   ...:
   ...: def func5():
   ...:     data = copy(test)
   ...:     np.where(data < 0, 0, data)
   ...:

In [2]: timeit func1
16.9 ns ± 0.117 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)

In [3]: timeit func2
15.8 ns ± 0.184 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)

In [4]: timeit func3
22.1 ns ± 0.287 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [5]: timeit func4
15.6 ns ± 0.0594 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)

In [6]: timeit func5
16.2 ns ± 0.187 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)

edited Dec 22 '21 at 22:57

answered Dec 22 '21 at 10:34

Dariusz Krynicki

2,544
1
22
47

1

Wow that is very fast, even faster than np.where(test<0,0,test). Thanks! – Thesmot Dec 22 '21 at 10:39
It''s actually a bit slower (19 vs 7.41 ms) for me. – Thesmot Dec 22 '21 at 10:44
1

I've added the `clip` to the comparison – Antony Hatchkins Dec 22 '21 at 11:05
1

Your benchmark is not correct because the operation is applied on the first invocation of the timeit function and after that you operate on an already non-negative array which is obviously faster and not the thing you're benchmarking against. – Antony Hatchkins Dec 22 '21 at 11:13
@antony-hatchkins this is false statement. I copy test each time in each loop so operate on original test values each time. – Dariusz Krynicki Dec 22 '21 at 11:39
clip seems to be fastest! – Dariusz Krynicki Dec 22 '21 at 11:43
yes, I will edit my answer and put reference to your approach. – Dariusz Krynicki Dec 22 '21 at 11:51
1

@AntonyHatchkins Good point actually! 64.5 ms ± 1.86 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) 54.3 ms ± 638 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 68.7 ms ± 665 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 68.5 ms ± 374 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 64.6 ms ± 1.57 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) These were my results (just copied the code above) ordered by func1 ... func5 Indeed on my hardware the putmask method is the fastest. I've learned a lot, thank you for all answers! – Thesmot Dec 22 '21 at 15:30
The @ notifier doesn't work with dashes. Use `@FirstLast` instead of `@first-last`. – Antony Hatchkins Dec 22 '21 at 16:12
This is a true statement. What do you think the result of this gist will be? https://gist.github.com/axil/9b53f42901e375db156891cddb033aee Try it ;) – Antony Hatchkins Dec 22 '21 at 16:13
Yes, test3 is a bit better, but you've missed something ;) 17ns is the speed of NOP in python. You measure how fast python does nothing. Try to find what you've missed. – Antony Hatchkins Dec 22 '21 at 16:16
Here you go, select is faster on larger arrays https://gist.github.com/0xdarkman/2ae254d9111c16a0dbe3f405d6f5bc94 – Dariusz Krynicki Dec 22 '21 at 22:50
@antony-hatchkins: select is fastest > where > clip when we operate on larger arrays – Dariusz Krynicki Dec 22 '21 at 22:52
Yes, your last gist is much better than what you've written so far, yet you've forgotten something there, too, so the results are corrupted once again but in a slightly different way than before ) – Antony Hatchkins Dec 23 '21 at 07:27
All my tests have been created in the same way so I dont know what you talk about: copy array -> call function -> calculate time delta. Please either explain what you have spotted incorrect in it or otherwise I will perceive it as false FUD. There is nothing different in my recent tests. – Dariusz Krynicki Dec 23 '21 at 08:13
by the way, have a look yourself and compare np.where on larger size arrays. – Dariusz Krynicki Dec 23 '21 at 08:24

score 2 · Answer 3 · answered Dec 22 '21 at 10:40

2

Use np.ndarray.clip like test.clip(min=0):

>>> test.clip(0)
array([0.        , 0.11819274, 0.36379089, ..., 0.        , 0.13401746,
       0.        ])
>>>

Documentation of np.ndarray.clip:

Return an array whose values are limited to [min, max]. One of max or min must be given.

answered Dec 22 '21 at 10:40

U13-Forward

69,221
14
89
114

1

According to my benchmark `clip` is the slowest. Which is surprising because this is its primary use case and it looks as if it is a bug in numpy that it is so slow. – Antony Hatchkins Dec 22 '21 at 16:19
I agree, clip is not fast on larger arrays. – Dariusz Krynicki Dec 22 '21 at 22:49

Faster for loop with only if in python

3 Answers3