3

I have a numpy array ssh_sum:

>>> ssh_sum
array([[ 0.,  2.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  2.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.,  2.]])

I wanted to compute the element-wise reciprocal values in this array. Numpy returns different values when I call np.reciprocal repeatedly:

>>> ssh_sum
array([[ 0.,  2.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  2.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.,  2.]])
>>> np.reciprocal(ssh_sum, where=(ssh_sum > 0.))
array([[  6.90326535e-310,   5.00000000e-001,   1.00000000e+000,
          0.00000000e+000,   1.07034283e-296,   1.33666925e+241],
       [  4.74783847e-309,   1.45260789e-296,   1.00000000e+000,
          5.00000000e-001,   2.13436228e-287,  -3.13188338e-294],
       [  4.85105226e-309,   1.08690709e+171,   4.09521901e+149,
          1.00000000e+000,   2.82730247e-311,   5.00000000e-001]])
>>> np.reciprocal(ssh_sum, where=(ssh_sum > 0.))
array([[ inf,  0.5,  1. ,  inf,  inf,  inf],
       [ inf,  inf,  1. ,  0.5,  inf,  inf],
       [ inf,  inf,  inf,  1. ,  inf,  0.5]])
>>> np.reciprocal(ssh_sum, where=(ssh_sum > 0.))
array([[  6.90326535e-310,   5.00000000e-001,   1.00000000e+000,
          0.00000000e+000,   1.07034283e-296,   1.33666925e+241],
       [  4.74783847e-309,   1.45260789e-296,   1.00000000e+000,
          5.00000000e-001,   2.13436228e-287,  -3.13188338e-294],
       [  4.85105226e-309,   1.08690709e+171,   4.09521901e+149,
          1.00000000e+000,   2.82730247e-311,   5.00000000e-001]])
>>> np.reciprocal(ssh_sum, where=(ssh_sum > 0.))
array([[ inf,  0.5,  1. ,  inf,  inf,  inf],
       [ inf,  inf,  1. ,  0.5,  inf,  inf],
       [ inf,  inf,  inf,  1. ,  inf,  0.5]])

Any idea what's going on here? I'm using Python 3.4.5 and numpy 1.13.3.

Hinton
  • 2,320
  • 4
  • 26
  • 32
  • 2
    If you need a workaround, there's always; `m = ssh_sum > 0 ; ssh_sum[m] = 1 / ssh_sum[m]` – cs95 Mar 24 '18 at 00:27
  • thanks @cᴏʟᴅsᴘᴇᴇᴅ your second comment saved the day – Hinton Mar 24 '18 at 00:30
  • Can you upgrade to 1.14? If this is a bug in your version of numpy, or an install problem with your copy of numpy, who knows whether it will affect anything else you're relying on? – abarnert Mar 24 '18 at 00:32
  • I just upgraded to 1.14, and the behavior is now consistent across calls, but instead of leaving 0 values at zero it's still setting them to some garbage like 4.85105226e-309 – Hinton Mar 24 '18 at 00:32
  • Maybe it's my CPU --- Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz , microcode 0xb00001b – Hinton Mar 24 '18 at 00:33
  • 4.05e-309 is pretty close to 0—as in more than close enough for `np.isclose` with defaults. – abarnert Mar 24 '18 at 00:37

1 Answers1

5

It is not just reciprocal; the issue occurs with any use of the where argument. I've been able to reproduce the issue with the master branch of numpy (np.__version__ is '1.15.0.dev0+c093997'), with functions such as abs, sign, add, subtract, etc.

If you read the docstrings of the numpy "ufuncs" carefully and interpret them correctly, you'll see that the behavior is not a bug. Here are the relevant descriptions from the numpy.reciprocal docstring:

out : ndarray, None, or tuple of ndarray and None, optional
    A location into which the result is stored. If provided, it must have
    a shape that the inputs broadcast to. If not provided or `None`,
    a freshly-allocated array is returned. A tuple (possible only as a
    keyword argument) must have length equal to the number of outputs.
where : array_like, optional
    Values of True indicate to calculate the ufunc at that position, values
    of False indicate to leave the value in the output alone.

Note, in particular:

  • where says "values of False indicate to leave the value in the output alone."
  • out says "If not provided or None, a freshly-allocated array is returned."

You did not provide an out argument, so a new array is allocated by your call to reciprocal. The contents of this array are not initialized; the array holds whatever happened to be in the allocated memory. When you use the where argument, only those positions in the output where where is True are assigned values. Positions where where is False are not touched, so they hold whatever random stuff was there when the array was allocated. For floating point output, the random stuff in the output might be 0.0, 4.85105226e-309, or any other random values.

To use the where argument the way you intended, you should also provide your own out argument, initialized with the values you want in the output where where is False. In your case, you should pass in an array of zeros:

In [84]: ssh_sum
Out[84]: 
array([[0., 2., 1., 0., 0., 0.],
       [0., 0., 1., 2., 0., 0.],
       [0., 0., 0., 1., 0., 2.]])

In [85]: out = np.zeros_like(ssh_sum)

In [86]: np.reciprocal(ssh_sum, where=ssh_sum > 0.0, out=out)
Out[86]: 
array([[0. , 0.5, 1. , 0. , 0. , 0. ],
       [0. , 0. , 1. , 0.5, 0. , 0. ],
       [0. , 0. , 0. , 1. , 0. , 0.5]])

In [87]: out
Out[87]: 
array([[0. , 0.5, 1. , 0. , 0. , 0. ],
       [0. , 0. , 1. , 0.5, 0. , 0. ],
       [0. , 0. , 0. , 1. , 0. , 0.5]])
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
  • This is awesome, thanks. I assume the reason I couldn't reproduce it was because my Mac automatically allocated a block of 0s. – cs95 Mar 24 '18 at 16:14