5

I'd like to take the natural logarithm of a numpy array but not calculate the logarithm of 0-entries at all. I.e. I'd like to implement the convention log(0)=0 over numpy arrays.

import numpy as np
import timeit
foo = np.random.rand(500)
%timeit np.log(foo)
%timeit np.log(foo,where=foo>0)

For the first call, this yields

The slowest run took 12.63 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 2.06 µs per loop

For the second call, we get

The slowest run took 8.35 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 4.31 µs per loop

So avoiding the 0's (even if there are no zeros in the array in this case) is much more expensive. This is clearly different if we look at sparse arrays, but is there a more efficient way to avoid zeros even in the non-sparse case?

Eric
  • 95,302
  • 53
  • 242
  • 374
Chris
  • 721
  • 1
  • 10
  • 23
  • I can't really reproduce that. And in your example you define `foo` and then use `bar`. – Paul Panzer Jan 13 '18 at 00:55
  • 1
    @MitchWheat: you have to consider it in the context of information theory, where we often encounter situations like 0 log 0 in the calculation of entropy. – Chris Jan 13 '18 at 00:56
  • @PaulPanzer: thanks, I corrected the variables. Cod runs fine on my side. – Chris Jan 13 '18 at 00:58
  • What I can't reproduce are the timings. Maybe your leftover `bar` happened to be a scalar? – Paul Panzer Jan 13 '18 at 01:00
  • 5
    *"you have to consider it in the context of information theory, where we often encounter situations like 0 log 0 in the calculation of entropy."* In that case, take a look at [`scipy.special.xlogy`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.xlogy.html) and [`scipy.special.entr`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.entr.html). – Warren Weckesser Jan 13 '18 at 01:14
  • 1
    Timings for me on 1.13.1 show that `where` takes 1.6x longer than without, on your benchmark – Eric Jan 13 '18 at 02:33
  • 1
    For me (1.13.3) it's more like `1.25x`. I'm doing `timeit('np.log(a, where=a>0)', setup='a=np.random.rand(500)', globals=globals(), number=100000)` and the same without 'where'. – Paul Panzer Jan 13 '18 at 03:10
  • Updated to numpy 1.14 and ran the same tests on the command line instead of spyder. The difference is still about a factor 1.7. Running python 2.7.13 (Anaconda custom (x86_64)). Any ideas on speeding this up? – Chris Jan 13 '18 at 04:04
  • What about @WarrenWeckesser's suggestion? Doesn't that crack it for you? – Paul Panzer Jan 13 '18 at 06:17
  • I'm afraid it doesn't apply in this situation, but the script.special.xlogy function is generally very interesting. It appears to me that there are no proposals to do the requested operation any faster. I'll wait a few days and otherwise mark my own post as answer. – Chris Jan 14 '18 at 19:27
  • 1
    If you're willing to add numba as a dependency, a post-jitted direct implementation of the manual loop with a branch was within 10% of np.log at small N and actually faster for me at large N. Might be worth looking at. – DSM Feb 11 '18 at 16:01

0 Answers0