24

I have a big data set of floating point numbers. I iterate through them and evaluate np.log(x) for each of them. I get

RuntimeWarning: divide by zero encountered in log

I would like to get around this and return 0 if this error occurs.

I am thinking of defining a new function:

def safe_ln(x):
    #returns: ln(x) but replaces -inf with 0
    l = np.log(x)
    #if l = -inf:
    l = 0
    return l

Basically,I need a way of testing that the output is -inf but I don't know how to proceed. Thank you for your help!

Puck
  • 2,080
  • 4
  • 19
  • 30
Julia
  • 1,369
  • 4
  • 18
  • 38
  • 5
    I would start by returning an actual variable instead of the nonexistent `result` – Junuxx Nov 21 '12 at 16:42
  • 1
    sorry, i just wrote this as an example :) – Julia Nov 21 '12 at 16:42
  • 1
    Because of the way your question is written ("iterate through the array"), I think you're not using NumPy properly, and what you're doing (and the accepted solution) are many orders of magnitude slower than the common solution. – jorgeca Nov 21 '12 at 17:21
  • is your input from a numpy array (that is: is the argument x in `safe_ln` a value from a numpy array? – bmu Nov 21 '12 at 17:22

8 Answers8

40

You are using a np function, so I can safely guess that you are working on a numpy array? Then the most efficient way to do this is to use the where function instead of a for loop

myarray= np.random.randint(10,size=10)
result = np.where(myarray>0, np.log(myarray), 0)

otherwise you can simply use the log function and then patch the hole:

myarray= np.random.randint(10,size=10)
result = np.log(myarray)
result[result==-np.inf]=0

The np.log function return correctly -inf when used on a value of 0, so are you sure that you want to return a 0? if somewhere you have to revert to the original value, you are going to experience some problem, changing zeros into ones...

EnricoGiampieri
  • 5,947
  • 1
  • 27
  • 26
  • Excellent! That's pretty much the answer I was writing. Though, it should be `np.where` instead of `where`. For me this is like two orders of magnitude faster, even for small arrays. – jorgeca Nov 21 '12 at 18:17
  • 2
    I feel @EnricoGiampieri 's approach is more correct than the answer you accepted. Nice illustration of numpy.where(), btw :) – Alex I Nov 21 '12 at 19:54
  • 2
    You're right, I got it wrong working with the "from numpy import *" ;) For the gain in speed, that's the power of numpy: as long as you stay inside it, it works at C speed. If you want to feel a real burst of power, try to combine it with the numexpr library, which optimize for cache and multi-processors ;) * from numexpr import evaluate as ev * ev("where(myarray>0, np.log(myarray), 0)") – EnricoGiampieri Nov 21 '12 at 19:56
  • numexpr is amazing at what it does. In this case, it starts being faster than numpy for arrays of size 1000 on, and ends up being 8 times faster from around 1e5 elements, in my laptop. (By the way, it should be `log` inside the string expression, not `np.log`, numexpr is not really using the log function from numpy) – jorgeca Nov 21 '12 at 21:57
  • I have run the first solution and works properly, but it outputs a warning: RuntimeWarning: divide by zero encountered in log. If I run it again in the same terminal session, the warning doesn't appear. Is this normal? The where is supposed not ro run the log if the value is 0! – Roger Veciana Jul 24 '13 at 09:25
  • 2
    This is a common misconception about the where function. It actually create both array, the one for the positive case and the one for the negative, and only afterward select each element from the appropriate one. as strange and un-optimal as it sound, it is necessary if one of the array has values that depend on one another, like a cumsum. Not to mention that you pass to the function the already generated array, not a function to generate one... – EnricoGiampieri Jul 24 '13 at 14:40
  • @EnricoGiampieri can you tell me why I am still getting this error. I have weekly sales data where values can be 0 or negative. I am using this formula to calculate log ------ train["log10_Weekly_Sales"] = np.where( train["Weekly_Sales"] >= 1, np.log10(train["Weekly_Sales"]), 0 ) – bhola prasad May 11 '21 at 13:25
27

Since the log for x=0 is minus infinite, I'd simply check if the input value is zero and return whatever you want there:

def safe_ln(x):
    if x <= 0:
        return 0
    return math.log(x)

EDIT: small edit: you should check for all values smaller than or equal to 0.

EDIT 2: np.log is of course a function to calculate on a numpy array, for single values you should use math.log. This is how the above function looks with numpy:

def safe_ln(x, minval=0.0000000001):
    return np.log(x.clip(min=minval))
Constantinius
  • 34,183
  • 8
  • 77
  • 85
  • 5
    you shouldn't do this with the elements of a numpy array. with numpy you should use vectorized functions and indexing. However it is not clear what type `x` is from the question. – bmu Nov 21 '12 at 17:32
  • Each of your `safe_ln` do a different thing (the last one returns -inf for xi in x if x <= 0) – jorgeca Nov 22 '12 at 13:11
3

You can do this.

def safe_ln(x):
   try:
      l = np.log(x)
   except ZeroDivisionError:
      l = 0
   return l
Jeff
  • 180
  • 1
  • 11
3

I like to use sys.float_info.min as follows:

>>> import numpy as np
>>> import sys
>>> arr = np.linspace(0.0, 1.0, 3)
>>> print(arr)
[0.  0.5 1. ]
>>> arr[arr < sys.float_info.min] = sys.float_info.min
>>> print(arr)
[2.22507386e-308 5.00000000e-001 1.00000000e+000]
>>> np.log10(arr)
array([-3.07652656e+02, -3.01029996e-01,  0.00000000e+00])

Other answers have also introduced small positive values, but I prefer to use the smallest possible value to make the approximation more accurate.

2

The answer given by Enrico is nice, but both solutions result in a warning:

RuntimeWarning: divide by zero encountered in log

As an alternative, we can still use the where function but only execute the main computation where it is appropriate:

# alternative implementation -- a bit more typing but avoids warnings.
loc = np.where(myarray>0)
result2 = np.zeros_like(myarray, dtype=float)
result2[loc] =np.log(myarray[loc])

# answer from Enrico...
myarray= np.random.randint(10,size=10)
result = np.where(myarray>0, np.log(myarray), 0)

# check it is giving right solution:
print(np.allclose(result, result2))

My use case was for division, but the principle is clearly the same:

x = np.random.randint(10, size=10)
divisor = np.ones(10,)
divisor[3] = 0 # make one divisor invalid

y = np.zeros_like(divisor, dtype=float)
loc = np.where(divisor>0) # (or !=0 if your data could have -ve values)
y[loc] = x[loc] / divisor[loc]
Bonlenfum
  • 19,101
  • 2
  • 53
  • 56
1

use exception handling:

In [27]: def safe_ln(x):
    try:
        return math.log(x)
    except ValueError:       # np.log(x) might raise some other error though
        return float("-inf")
   ....:     

In [28]: safe_ln(0)
Out[28]: -inf

In [29]: safe_ln(1)
Out[29]: 0.0

In [30]: safe_ln(-100)
Out[30]: -inf
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
0

you could do:

def safe_ln(x):
    #returns: ln(x) but replaces -inf with 0
    try:
        l = np.log(x)
    except RunTimeWarning:
        l = 0
    return l
Cameron Sparr
  • 3,925
  • 2
  • 22
  • 31
0

For those looking for a np.log solution that intakes a np.ndarray and nudges up only zero values:

import sys
import numpy as np

def smarter_nextafter(x: np.ndarray) -> np.ndarray:
    safe_x = np.where(x != 0, x, np.nextafter(x, 1))
    return np.log(safe_x)

def clip_usage(x: np.ndarray, safe_min: float | None = None) -> np.ndarray:
    # Inspiration: https://stackoverflow.com/a/13497931/
    clipped_x = x.clip(min=safe_min or np.finfo(x.dtype).min)
    return np.log(clipped_x)

def inplace_usage(x: np.ndarray, safe_min: float | None = None) -> np.ndarray:
    # Inspiration: https://stackoverflow.com/a/62292638/
    x[x == 0] = safe_min or np.finfo(x.dtype).min
    return np.log(x)

Or if you don't mind nudging all values and like bad big-O runtimes:

def brute_nextafter(x: np.ndarray) -> np.ndarray:
    # Just for reference, don't use this
    while not x.all():
        x = np.nextafter(x, 1)
    return np.log(x)
Intrastellar Explorer
  • 3,005
  • 9
  • 52
  • 119