3

I am getting weird results from numpy sqrt method when applying it on an array of integers with a where condition. See below.

With integers:

a = np.array([1, 4, 9])

np.sqrt(a, where=(a>5))
Out[3]: array([0. , 0.5, 3. ])

With floats:

a = np.array([1., 4., 9.])

np.sqrt(a, where=(a>5))
Out[25]: array([1., 4., 3.])

Is it a bug or a misunderstanding of how this method works?

jared
  • 4,165
  • 1
  • 8
  • 31
Antoine101
  • 167
  • 1
  • 8
  • which numpy version do you have? Testing on numpy 1.24.2 I get `np.sqrt(a, where=(a>5)).round(3)` -> `array([0., 0., 3.])` – mozway Jul 19 '23 at 08:11
  • Works also for me with version `1.21.5` – Marcello Zago Jul 19 '23 at 08:19
  • for me using the same one, it flips from `array([1., 4., 3.])` to `array([5.e-324, 2.e-323, 3.e+000])`. Numpy version 1.24.3 – Mark Jul 19 '23 at 08:33
  • `np.sqrt(a, where=(a>5), out=np.zeros(a.shape))`), – hpaulj Jul 19 '23 at 08:37
  • Thank you for your comments everyone! @hpaulj : This returns "array([0., 0., 3.])" but I would like to get "array([1., 4., 3.])". How should I do that? – Antoine101 Jul 19 '23 at 08:48
  • @hpaulj still seems buggy to me, the `out` parameter is not explicitly described as mandatory, from the [doc](https://numpy.org/doc/stable/reference/generated/numpy.sqrt.html), related to `out`: "*If not provided or None, a freshly-allocated array is returned.*". If `out` is mandatory when `where` is used, then failing to provide it should return an error. Especially since the function both assigns the values to `out` **and** returns a reference to `out` as output. – mozway Jul 19 '23 at 08:50
  • @Antoine101: `np.where((a>5), np.sqrt(a), a)`? – mozway Jul 19 '23 at 08:51
  • @mozway, 'freshly allocated' as in `np.empty(a.shape))`. The non-where values are unpredicable. – hpaulj Jul 19 '23 at 08:52
  • @hpaulj that's what I was wondering, then indeed the `out=None` behavior seems useless as `where` implies that you have non filled values while `empty` shouldn't be used in such case. Interesting pitfall. – mozway Jul 19 '23 at 09:02
  • You have to provide an `out` that has the non-where values you want. But with `sqrt` be cautious with the dtype. – hpaulj Jul 19 '23 at 09:32

2 Answers2

2

I think there might be a bug inconsistency*, when repeating the command several times I don't get consistent results.

*this is actually due to the probable use of numpy.empty to initialize the output. This function reuses memory without setting the default values and thus will have non-predictable values in the cells that are not manually overwritten.

Here is an example:

import numpy as np
print(np.__version__)

a = np.array([1, 4, 9])                   # integers
print(np.sqrt(a, where=(a>5)).round(3))

a = np.array([1., 4., 9.])                # floats
print(np.sqrt(a, where=(a>5)).round(3))

a = np.array([1, 4, 9])                   # integers again
print(np.sqrt(a, where=(a>5)).round(3))

Output:

1.24.2
[0. 0. 3.]
[0. 0. 3.]
[1. 4. 3.]  # this is now different

From the remark of @hpaulj, it seems that providing out is required. Indeed, this prevents the inconsistent behavior with the above example.

out should be initialized with a predictable function, such as numpy.zeros.

import numpy as np
print(np.__version__)

out = np.zeros(3)
a = np.array([1, 4, 9])      # integers
print(np.sqrt(a, where=(a>5), out=out))
print(out)

out = np.zeros(3)
a = np.array([1., 4., 9.])   # floats
print(np.sqrt(a, where=(a>5), out=out))
print(out)

a = np.array([1, 4, 9])      # integers again
print(np.sqrt(a, where=(a>5), out=out))
print(out)

Output:

1.24.2
[0. 0. 3.]
[0. 0. 3.]
[0. 0. 3.]
[0. 0. 3.]
[0. 0. 3.]
[0. 0. 3.]

Nevertheless, this seems to be an inconsistent behavior.

The doc specifies that it out is not provided, a new array should be allocated:

out ndarray, None, or tuple of ndarray and None, optional

A location into which the result is stored. If provided, it must have a shape that the inputs broadcast to. If not provided or None, a freshly-allocated array is returned. A tuple (possible only as a keyword argument) must have length equal to the number of outputs.

where array_like, optional

This condition is broadcast over the input. At locations where the condition is True, the out array will be set to the ufunc result. Elsewhere, the out array will retain its original value. Note that if an uninitialized out array is created via the default out=None, locations within it where the condition is False will remain uninitialized.

In your case you rather want to use numpy.where:

out = np.where((a>5), np.sqrt(a), a)

# or, to avoid potential errors if you have forbidden values
out = np.where(a>5, np.sqrt(a, where=a>5), a)

# or
out = np.sqrt(a, where=a>5, out=a.astype(float))

Output: array([1., 4., 3.])

mozway
  • 194,879
  • 13
  • 39
  • 75
  • This use of `where` requires an `out` as well. – hpaulj Jul 19 '23 at 08:36
  • @hpaulj it's not so clear from the docs, the description of out seems to suggest it's optional. In any case, why is the output not consistent across runs? – mozway Jul 19 '23 at 08:40
  • Yes I had the hint that out might play a role but i don't fully understand it. And it doesn't explain the inconsistencies between runs. – Antoine101 Jul 19 '23 at 08:53
  • 1
    @Antoine101 looks like [`numpy.empty`](https://numpy.org/doc/stable/reference/generated/numpy.empty.html) is used to initialize the output array, this function is marginally faster than `numpy.zeros` but also a well known source of errors if you don't make sure that all values are manually instantiated. Indeed, upon creation of the array it reuses a random block of memory and the default values will thus be arbitrary and not predictable. – mozway Jul 19 '23 at 09:04
  • I think the `ufunc` `where/out` parameters are most useful when you want to prevent run-time warnings like divide by 0. – hpaulj Jul 19 '23 at 09:40
  • @hpaulj yes, which seems to be part of what OP needs, I added some examples at the bottom of my answer. `np.sqrt(a, where=a>5, out=a.astype(float))` might be the shortest approach as it creates a copy, modifies only the desired indices and avoids runtime warnings, what do you think? – mozway Jul 19 '23 at 09:52
  • 1
    It seems to work fine on my machine! Thanks @mozway – Antoine101 Jul 19 '23 at 13:21
1

There seems to be a problem with ufuncs because of default out parameter will create empty uninitialized array (which is probably what causes this error) - there is more detailed description here: "where" clause in numpy-1.13 ufuncs.

From numpy docs (https://numpy.org/doc/stable/reference/ufuncs.html):

If ‘out’ is None (the default), a uninitialized return array is created. The output array is then filled with the results of the ufunc in the places that the broadcast ‘where’ is True. If ‘where’ is the scalar True (the default), then this corresponds to the entire output being filled. Note that outputs not explicitly filled are left with their uninitialized values.

...

Note that if an uninitialized return array is created, values of False will leave those values uninitialized.

Workaround is to add an explicit out parameter.

For example:

>>> a = np.array([1, 4, 9])
>>> np.sqrt(a, where=(a>5))
array([5.e-324, 2.e-323, 3.e+000])

>>> o = np.zeros(3)
>>> np.sqrt(a, where=(a>5), out=o)
array([0., 0., 3.])

>>> a = np.array([1., 4., 9.])
>>> np.sqrt(a, where=(a>5))
array([5.e-324, 2.e-323, 3.e+000])

>>> o = np.zeros(3)
>>> np.sqrt(a, where=(a>5), out=o)
array([0., 0., 3.])

To get what you (probably) originally wanted, numpy.where may be a good choice:

>>> a = np.array([1, 4, 9])
>>> np.where(a > 5, np.sqrt(a), a)
array([1., 4., 3.])

>>> a = np.array([1., 4., 9.])
>>> np.where(a > 5, np.sqrt(a), a)
array([1., 4., 3.])
Jan Spurny
  • 5,219
  • 1
  • 33
  • 47
  • Thank you for your reply @Jan Spurny. It's weird that even without the out parameter I don't get the same result as you: "array([5.e-324, 2.e-323, 3.e+000])" I would like to get : array([1., 4., 3.]) instead of array([0., 0., 3.]) How would you do? – Antoine101 Jul 19 '23 at 08:52
  • @Antoine101 you get different results because uninitialized values will be different on every machine and probably even on the same one. As for how to do what you originally wanted, I'll update the answer – Jan Spurny Jul 19 '23 at 09:06
  • Ok I get it! Thanks for the explanation. I tried with np.where as you suggest, creating two masks as follows, but I get an error message, although the result is correct: bool_1 = a > 5 bool_2 = a < 5 np.where(bool_1, -np.sqrt(a), a) -> array([ 1., 4., -3.]) np.where(bool_2, np.sqrt(a), a) -> array([ 1., 2., -3.]) BUT "RuntimeWarning: invalid value encountered in sqrt a = np.where(bool_2, np.sqrt(a), a)" – Antoine101 Jul 19 '23 at 09:15
  • 1
    Then you need `np.where(a>5, np.sqrt(a, where=a>5), a)` (you can reuse the same mask). Not `out` needed here as the unpredictable values will not be used. – mozway Jul 19 '23 at 09:18
  • @Antoine101 this is a different error, it would probably be better to create a new question rather than solving it in the comments here – Jan Spurny Jul 19 '23 at 09:21
  • @Antoine101 (but it seems you may have a problem with negative values - i.e `-3.`) – Jan Spurny Jul 19 '23 at 09:24
  • `np.where` controls the returned values, but does not prevent erroneous evaluations, such as `sqrt(-1)` or divide by 0. – hpaulj Jul 19 '23 at 09:38
  • That 2017 SO link explains why pairing the `where` and `out` together is so natural to me; I've been commenting this issue for a number of years. The `out` is useful without the `where`. It's used to control the creation, or not, of temporary buffers. The source code for `np.cross` is nice example of this use. – hpaulj Jul 19 '23 at 15:49