5

I am trying to calculate the following ratio: w(i) / (sum(w(j)) where w are updated using an exponential decreasing function, i.e. w(i) = w(i) * exp(-k), k being a positive parameter. All the numbers are non-negative. This ratio is then used to a formula (multiply with a constant and add another constant). As expected, I soon run into underflow problems.

I guess this happens often but can someone give me some references on how to deal with this? I did not find an appropriate transformation so one thing I tried to do is set some minimum positive number as a safety threshold but I did not manage to find which is the minimum positive float (I am representing numbers in numpy.float128). How can I actually get the minimum positive such number on my machine? The code looks like this:

w = np.ones(n, dtype='float128')
lt = np.ones(n)
for t in range(T):
    p = (1-k) * w / w.sum() + (k/n)
    # Process a subset of the n elements, call it set I, j is some range()
    for i in I: 
        s = p[list(j[i])].sum()
        lt /= s
        w[s] *= np.exp(-k * lt)

where k is some constant in (0,1) and n is the length of the array

user90772
  • 387
  • 1
  • 5
  • 12
  • `numpy.float128` is a very misleading name - its actual precision is the same as what your C compiler calls a [`long double`](https://en.wikipedia.org/wiki/Long_double), which is 80 bits on x86 (for clarity it's recommended to use the `numpy.longdouble` alias instead). To get the smallest representable positive float you could use [`np.finfo(numpy.float128).tiny`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.finfo.html) which is `3.3621031431120935063e-4932` on my machine, but a more useful threshold would probably be the machine epsilon (`eps`). – ali_m Oct 30 '15 at 17:26
  • Could you show us your code? I suspect that there is probably a better way of computing that ratio. – ali_m Oct 30 '15 at 17:27
  • Thank you @ali_m I have added a snippet of the code, I could not add it all but the idea is that I process a different subset of the ws through time and I decrease those using np.exp(). – user90772 Nov 02 '15 at 09:47

1 Answers1

3

When working with exponentially small numbers it's usually better to work in log space. For example, log(w*exp(-k)) = log(w) - k, which won't have any over/underflow problems unless k is itself exponentially large or w is zero. And, if w is zero, numpy will correctly return -inf. Then, when doing the sum, you factor out the largest term:

log_w = np.log(w) - k
max_log_w = np.max(log_w)
# Individual terms in the following may underflow, but then they wouldn't
# contribute to the sum anyways.
log_sum_w = max_log_w + np.log(np.sum(np.exp(log_w - max_log_w)))
log_ratio = log_w - log_sum_w

This probably isn't exactly what you want since you could just factor out the k completely (assuming it's a constant and not an array), but it should get you on your way.

Scikit-learn implements a similar thing with extmath.logsumexp, but it's basically the same as the above.

clwainwright
  • 1,624
  • 17
  • 21
  • 1
    There is also [`numpy.logaddexp`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.logaddexp.html#numpy.logaddexp) and [`scipy.misc.logsumexp`](http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.misc.logsumexp.html) – ali_m Oct 31 '15 at 16:58
  • Didn't know about `scipy.misc.logsumexp`, which seems like it'd be especially useful for the OP. Thanks! – clwainwright Oct 31 '15 at 20:19
  • `np.logaddexp` is also a ufunc, so you can use `np.logaddexp.reduce` to sum along an axis of an array – ali_m Oct 31 '15 at 20:25