numpy: Efficiently avoid 0s when taking log(matrix)

Question

from numpy import *

m = array([[1,0],
           [2,3]])

I would like to compute the element-wise log2(m), but only in the places where m is not 0. In those places, I would like to have 0 as a result.

I am now fighting against:

RuntimeWarning: divide by zero encountered in log2

Try 1: using where

res = where(m != 0, log2(m), 0)

which computes me the correct result, but I still get logged a RuntimeWarning: divide by zero encountered in log2. It looks like (and syntactically it is quite obvious) numpy still computes log2(m) on the full matrix and only afterwards where picks the values to keep.

I would like to avoid this warning.

Try 2: using masks

from numpy import ma

res = ma.filled(log2(ma.masked_equal(m, 0)), 0)

Sure masking away the zeros will prevent log2 to get applied to them, won't it? Unfortunately not: We still get RuntimeWarning: divide by zero encountered in log2.

Even though the matrix is masked, log2 still seems to be applied to every element.

How can I efficiently compute the element-wise log of a numpy array without getting division-by-zero warnings?

Of course I could temporarily disable the logging of these warnings using seterr, but that doesn't look like a clean solution.
And sure a double for loop would help with treating 0s specially, but defeats the efficiency of numpy.

Any ideas?

As you prefer. Notice however that using masked arrays is less efficient then disabling momentarily the error. And disabling the specific 'divide by zero' warning does not disable the other problem with calculating the log of a number, which is negative input. That case is captured as an 'invalid value' warning. — gg349, Feb 13 '14 at 12:42
On the other hand, using masked arrays captures the two errors as the same, and may lead you to not notice an error in the input. In other words, a negative number in the input is treated like a zero, and will give zero as a result. — gg349, Feb 13 '14 at 12:48

score 36 · Answer 1 · answered Sep 06 '18 at 17:30

36

Another option is to use the where parameter of numpy's ufuncs:

m = np.array([[1., 0], [2, 3]])
res = np.log2(m, out=np.zeros_like(m), where=(m!=0))

No RuntimeWarning is raised, and zeros are introduced where the log is not computed.

answered Sep 06 '18 at 17:30

mdeff

1,566
17
21

`out=np.zeros_like(m)` is quite important. The output might look alright when you forget this, but you will be using uninitialized memory. – bers Feb 04 '21 at 15:27
1

@bers Are you sure? How come the uninitialized memory is all zeros (at least on initial testing)? – Janosh Sep 21 '22 at 03:18
But you're right, it does say so in [the docs](https://numpy.org/doc/stable/reference/generated/numpy.log.html). – Janosh Sep 21 '22 at 03:19
1

@Casimir I have wondered the same a couple of times. I guess this is what "undefined behavior" means in standards - it may be working, but you must not rely on it. Pretty dangerous in my opinion. For the record, `import numpy as np; m=np.zeros((1000,)); sum(np.log2(m, where=(m!=0)))` gives 0 on Python 3.8.10 on WSL Ubuntu, but 4.53139328456056e-308 on Python 3.10.7 on Windows 10 (same system). – bers Sep 21 '22 at 07:08
If m is of an Integer dtype, e.g. int64, the above gives a TypeError of 'same_kind' casting rule. To fix it simply add dtype float to the zeros_like function: `np.zeros_like(m, dtype='float64')`. This ensures that the zeros have the same data type as the log function outputs. – JStrahl Nov 11 '22 at 13:23

John Zwinck · Accepted Answer · 2014-02-13T12:00:46.887

35

We can use masked arrays for this:

>>> from numpy import *
>>> m = array([[1,0], [2,3]])
>>> x = ma.log(m)
>>> print x.filled(0)
[[ 0.          0.        ]
 [ 0.69314718  1.09861229]]

edited Feb 13 '14 at 12:00

answered Feb 13 '14 at 11:44

John Zwinck

239,568
38
324
436

1

`q=array([0.0,1.0])` and `log(q.astype(float))` raises the RuntimeWarning on my machine. As well, numpy stops warning me about the division by error after the first warning in ipython. Maybe you were mislead by that? – gg349 Feb 13 '14 at 11:51
1

@flebool: Haha, you're right, I was misled. I used regular Python (2.7.6) and it too stopped warning after the first try. Oh well, I'll edit my answer shortly. – John Zwinck Feb 13 '14 at 11:54
Ah, `ma` has its own `log2`! Great. – nh2 Feb 13 '14 at 12:28

gg349 · Answer 3 · 2019-02-08T13:10:48.607

Simply disable the warning for that computation:

from numpy import errstate,isneginf,array

m = array([[1,0],[2,3]])
with errstate(divide='ignore'):
    res = log2(m)

And then you can postprocess the -inf if you want:

res[isneginf(res)]=0

EDIT: I put here some comments about the other option, which is using masked arrays, posted in the other answer. You should opt for disabling the error for two reasons:

1) Using masked arrays is by far less efficient then disabling momentarily the error, and you asked for efficiency.

2) Disabling the specific 'divide by zero' warning does NOT disable the other problem with calculating the log of a number, which is negative input. Negative input is captured as an 'invalid value' warning, and you will have to deal with it.

On the other hand, using masked arrays captures the two errors as the same, and will lead you to not notice a negative number in the input. In other words, a negative number in the input is treated like a zero, and will give zero as a result. This is not what you asked.

3) As a last point and as a personal opinion, disabling the warning is very readable, it is obvious what the code is doing and makes it more mantainable. In that respect, I find this solution cleaner then using masked arrays.

use `with np.errstate(divide='ignore'): res = log2(m)` instead of calling `seterr` twice: you cannot forget to disable ignoring after your calculation, and it will reset to the previous setting afterwards, too (which @gg349 doesn't atm): (yes, I'm very late with this :) — azrdev, Sep 02 '18 at 12:12
@azrdev, absolutely! I have implemented your proposal above. — gg349, Feb 08 '19 at 13:13

score 9 · Answer 4 · answered Feb 13 '14 at 17:11

The masked array solution and the solution that disables the warning are both fine. For variety, here's another that uses scipy.special.xlogy. np.sign(m) is given as the x argument, so xlogy returns 0 wherever np.sign(m) is 0. The result is divided by np.log(2) to give the base-2 logarithm.

In [4]: from scipy.special import xlogy

In [5]: m = np.array([[1, 0], [2, 3]])

In [6]: xlogy(np.sign(m), m) / np.log(2)
Out[6]: 
array([[ 0.       ,  0.       ],
       [ 1.       ,  1.5849625]])

score 3 · Answer 5 · answered Aug 22 '19 at 11:45

Problem

Questions: Feb 2014, May 2012

For an array containing zeros or negatives we get the respective errors.

y = np.log(x)
# RuntimeWarning: divide by zero encountered in log
# RuntimeWarning: invalid value encountered in log

Solution

markroxor suggests np.clip, in my example this creates a horizontal floor. gg349 and others use np.errstate and np.seterr, I think these are clunky and does not solve the problem. As a note np.complex doesn't work for zeros. user3315095 uses indexing p=0<x, and NumPy.log has this functionality built in, where/out. mdeff demonstrates this, but replaces the -inf with 0 which for me was insufficient, and doesn't solve for negatives.

I suggest 0<x and np.nan (or if needed np.NINF/-np.inf).

y = np.log(x, where=0<x, out=np.nan*x)

John Zwinck uses mask matrix np.ma.log this works but is computationally slower, try App:timeit.

Example

import numpy as np
x = np.linspace(-10, 10, 300)

# y = np.log(x)                         # Old
y = np.log(x, where=0<x, out=np.nan*x)  # New

import matplotlib.pyplot as plt
plt.plot(x, y)
plt.show()

App:timeit

Time Comparison for mask and where

import numpy as np
import time
def timeit(fun, xs):
    t = time.time()
    for i in range(len(xs)):
        fun(xs[i])
    print(time.time() - t)

xs = np.random.randint(-10,+10, (1000,10000))
timeit(lambda x: np.ma.log(x).filled(np.nan), xs)
timeit(lambda x: np.log(x, where=0<x, out=np.nan*x), xs)

score 2 · Answer 6 · answered Feb 16 '14 at 03:49

2

What about the following

from numpy import *
m=array((-1.0,0.0,2.0))
p=m > 0.0
print 'positive=',p
print m[p]
res=zeros_like(m)
res[p]=log(m[p])
print res

answered Feb 16 '14 at 03:49

user3315095

21
1

This also works! I like the `ma` solution a bit better because it allows me to write a side-effect free expression, and it composes nicely with other functions from `ma`. – nh2 Feb 16 '14 at 13:09

score 2 · Answer 7 · answered Jul 18 '18 at 17:10

2

You can use something like - m = np.clip(m, 1e-12, None) to avoid the log(0) error. This will set the lower bound to 1e-12.

answered Jul 18 '18 at 17:10

markroxor

5,928
2
34
43

Diego Alonso · Answer 8 · 2022-10-06T13:13:17.787

0

Just operate on the nonzero elements:

import numpy as np
x = np.arange(10)
x[x!=0] = np.log2(x[x!=0])
print(x)

Bonus. Compute the Kullback-Leibler divergence while avoiding zeroes in the division:

def kldiv(p,q):
    x = np.divide(p[(p!=0)&(q!=0)],q[(p!=0)&(q!=0)])
    return np.sum(p[(p!=0)&(q!=0)] * np.log(x))

edited Oct 06 '22 at 13:13

answered Oct 06 '22 at 12:56

Diego Alonso

71
6

1

This is the same approach as [@user3315095 suggested](https://stackoverflow.com/a/21807038/263061) 8 years ago to this thread (using `m > 0.0` instead of `x != 0`). – nh2 Oct 07 '22 at 13:23

numpy: Efficiently avoid 0s when taking log(matrix)

8 Answers8

Problem

Solution

Example

App:timeit

Linked

Related