3

Given two reals x and y, I want to compute the following function in python:

log Pr [ x <= t <= y ],

where t is sampled from a normal distribution.

One naive implementation is to use scipy.stats.norm.

np.log(scipy.stats.norm.cdf(y) - scipy.stats.norm.cdf(x))

Unfortunately, this causes an underflow when x and y are far from 0. How can one prevent such a numerical error?

Simon Gibbons
  • 6,969
  • 1
  • 21
  • 34
Pegi
  • 31
  • 1

1 Answers1

2

This problem is much more stable if done in logspace.

The trick is to use scipy.stats.norm.logcdf for values less than zero and scipy.stats.norm.logsf for values larger than zero.

This combined with a stable algorithm for calculating log(exp(y) - exp(x)) gives reasonable results

import numpy as np
from scipy.stats import norm

def log_subtract(x, y):
    return x + np.log1p(-np.exp(y-x))

def lnprob(x, y):
    if x < 0:
        return log_subtract(norm.logcdf(y), norm.logcdf(x))
    else:
        return log_subtract(norm.logsf(x), norm.logsf(y))
Simon Gibbons
  • 6,969
  • 1
  • 21
  • 34