9

I need to compute the integral of the following function within ranges that start as low as -150:

import numpy as np
from scipy.special import ndtr

def my_func(x):
    return np.exp(x ** 2) * 2 * ndtr(x * np.sqrt(2))

The problem is that this part of the function

np.exp(x ** 2)

tends toward infinity -- I get inf for values of x less than approximately -26.

And this part of the function

2 * ndtr(x * np.sqrt(2))

which is equivalent to

from scipy.special import erf

1 + erf(x)

tends toward 0.

So, a very, very large number times a very, very small number should give me a reasonably sized number -- but, instead of that, python is giving me nan.

What can I do to circumvent this problem?

abcd
  • 10,215
  • 15
  • 51
  • 85
  • Are you sure there are no analytical solutions to your integral? – Reblochon Masque Aug 31 '15 at 01:59
  • @ReblochonMasque no, i'm not. do you know where i might find one? i certainly don't have the math chops to figure it out on my own. – abcd Aug 31 '15 at 02:00
  • mathstackexchange maybe - or wolframalpha - or sympy – Reblochon Masque Aug 31 '15 at 02:01
  • 1
    Does something like this help? `np.exp(x**2 + np.log(2) + np.log(ndtr(x*np.sqrt(2))))` – askewchan Aug 31 '15 at 02:05
  • @askewchan sort of. the `nan` values are replaced with zeros. where there was a drop from `0.02` to `nan`, now it's `0.02` to `0`. what it should be is a very gradual descent from `0.02` closer and closer to `0`. – abcd Aug 31 '15 at 02:10
  • 1
    Does that substantially affect the value of the integral? Alternatively, you may be able to figure out analytically what `log(ndtr(x))` is, and break it up like for the `exp` term.. you get the idea. – askewchan Aug 31 '15 at 02:13
  • @askewchan ah, good idea. it does substantially affect the integral -- because i'm integrating in some cases entirely within the `0` zone. – abcd Aug 31 '15 at 02:14
  • @ReblochonMasque wolframalpha gives me something involving the "generalized hypergeometric function." is that implementable in python? i would need `hyp2f2`, but it looks as though `scipy` doesn't have that. but, now i see that `mpmath` does . . . whatever that is. – abcd Aug 31 '15 at 02:15
  • 1
    You could eliminate the `log` call in @askewchan's solution by using [`scipy.special.log_ndtr`](http://docs.scipy.org/doc/scipy/reference/generated/scipy.special.log_ndtr.html) – ali_m Aug 31 '15 at 02:18
  • @ali_m, lovely :) `x=np.arange(-100,10,25); np.exp(x**2 + np.log(2) + log_ndtr(x*np.sqrt(2))) --> [0.00564161, 0.00752186, 0.01128154, 0.02254957, 1. ]` – askewchan Aug 31 '15 at 02:19
  • It probably is, but if Ali_m solution is correct, I suggest you use that instead. :) – Reblochon Masque Aug 31 '15 at 02:31

3 Answers3

5

I think a combination of @askewchan's solution and scipy.special.log_ndtr will do the trick:

from scipy.special import log_ndtr

_log2 = np.log(2)
_sqrt2 = np.sqrt(2)

def my_func(x):
    return np.exp(x ** 2) * 2 * ndtr(x * np.sqrt(2))

def my_func2(x):
    return np.exp(x * x + _log2 + log_ndtr(x * _sqrt2))

print(my_func(-150))
# nan

print(my_func2(-150)
# 0.0037611803122451198

For x <= -20, log_ndtr(x) uses a Taylor series expansion of the error function to iteratively compute the log CDF directly, which is much more numerically stable than simply taking log(ndtr(x)).


Update

As you mentioned in the comments, the exp can also overflow if x is sufficiently large. Whilst you could work around this using mpmath.exp, a simpler and faster method is to cast up to a np.longdouble which, on my machine, can represent values up to 1.189731495357231765e+4932:

import mpmath

def my_func3(x):
    return mpmath.exp(x * x + _log2 + log_ndtr(x * _sqrt2))

def my_func4(x):
    return np.exp(np.float128(x * x + _log2 + log_ndtr(x * _sqrt2)))

print(my_func2(50))
# inf

print(my_func3(50))
# mpf('1.0895188633566085e+1086')

print(my_func4(50))
# 1.0895188633566084842e+1086

%timeit my_func3(50)
# The slowest run took 8.01 times longer than the fastest. This could mean that
# an intermediate result is being cached  100000 loops, best of 3: 15.5 µs per
# loop

%timeit my_func4(50)
# The slowest run took 11.11 times longer than the fastest. This could mean
# that an intermediate result is being cached  100000 loops, best of 3: 2.9 µs
# per loop
ali_m
  • 71,714
  • 23
  • 223
  • 298
  • 2
    Probably a minor note for this use case, but for scalars, `math.log` and `math.sqrt` are about ten times faster than `np.log` and `np.sqrt`. Or even faster, `log2 = math.log(2)` outside of the function definition. For me this gives about a factor of two speedup for the call to `quad` – askewchan Aug 31 '15 at 02:36
  • @askewchan good point - I wasn't really thinking about performance – ali_m Aug 31 '15 at 02:43
  • this is great. many thanks. i'm testing it now. `my_func2(50)` raises a `RunTimeWarning`: "overflow encountered in exp." it appears that `np.exp` can't handle inputs larger than `709`. (same for `math.exp`.) any idea how i can get around that? – abcd Aug 31 '15 at 03:36
  • i'm going to look into an analytical solution to the integral, because it seems that the overflow is a dead end. – abcd Aug 31 '15 at 03:43
  • @SalvadorDali i'm not sure what you're asking. all i did was `my_func2(50)`. i didn't calculate `integral(log(my_func2(50)))` or intend to calculate that. i guess my assumption was that `quad` wouldn't work if the function itself couldn't be evaluated at that point. i think i tried `quad` first, and it failed. (or was your question for @ali_m?) – abcd Aug 31 '15 at 04:38
  • @ali_m using `mpmath.exp` instead of `np.exp` when `x` is large resolves all my problems. you might want to update your answer with that information. (or i can do it if you aren't sure what i mean.) many thanks! – abcd Aug 31 '15 at 07:00
  • See @SimonByrne's answer for a very simple approach. – Warren Weckesser Aug 31 '15 at 11:39
  • @dbliss you should accept Simon's answer instead of mine – ali_m Aug 31 '15 at 11:45
  • @SalvadorDali Was your comment directed at me? Mine and askewchan's solution also computes `f(x)` rather than `log(f(x))` - it just uses the fact that `a exp(b) = exp(a + log(b))` to avoid overflow inside the exponent. It's a moot point anyway, since Simon's answer is much better. – ali_m Aug 31 '15 at 14:05
  • @ali_m my comment was related to you. Now I see that I was wrong. Thank you. – Salvador Dali Aug 31 '15 at 18:14
  • @ali_m i've accepted simon's answer, but i still think yours is a valuable contribution to this thread. it greatly enhanced my thinking about this issue. – abcd Aug 31 '15 at 19:56
4

There already is such a function: erfcx. I think erfcx(-x) should give you the integrand you want (note that 1+erf(x)=erfc(-x)).

Simon Byrne
  • 7,694
  • 1
  • 26
  • 50
  • `erf(x)` is an odd function: `erf(-x) = -erf(x)`. So `erfc(-x) = 1 - erf(-x) = 1 + erf(x)` (the first equality is the definition of `erfc`, the second uses the odd symmetry). – Warren Weckesser Aug 31 '15 at 19:32
  • Note that by flipping the signs of the arguments, you can replace, for example, `quad(my_func, -14, -4)` with `quad(erfcx, 4, 14)`. – Warren Weckesser Aug 31 '15 at 19:36
  • @WarrenWeckesser thanks. one last question: `erfcx` is defined as `exp(x ** 2) * erfc(x)`. if i do `erfcx(-x)`, doesn't that give me `exp(-x ** 2) * erfc(-x)`, when what i want is `exp(x ** 2) * erfc(-x)`? or, wait, i guess it gives me `exp((-x) ** 2) * erf((-x))`, which *is* what i want. – abcd Aug 31 '15 at 19:39
2

Not sure how helpful will this be, but here are a couple of thoughts that are too long for a comment.

You need to calculate the integral of 2 \cdot e^{x^2} \cdot f(\sqrt{2}x), which you correctly identified would be e^{x^2}*(1 + erf(x)). Opening the brackets you can integrate both parts of the summation.

enter image description here

Scipy has this imaginary error function implemented

The second part is harder:

enter image description here

This is a generalized hypergeometric function. Sadly it looks like scipy does not have an implementation of it, but this package claims it does.

Here I used indefinite integrals without constants, knowing the from to values it is clear how to use definite ones.

Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
  • what seems to be screwing me up with this strategy is that `scipy.special.erfi(50)` evaluates to `inf`. – abcd Aug 31 '15 at 04:03
  • math question for ya: how would i handle the case where my integral is "half definite" -- i.e., i have an upper limit, but my lower limit is `-inf`. i'm not sure how to evaluate the functions in your question *at* `-inf`. – abcd Aug 31 '15 at 04:19
  • @dbliss `scipy.special.erfi(50)` is inf. If `b = super huge`, then most probably `b + a` is also super huge, where a > 0 (because the second integral is also positive if x > 1). Not really sure how do you want to calculate it. – Salvador Dali Aug 31 '15 at 04:27
  • (1) right, but `mpmath.erfi(50)` returns `6 * 10 ** 1083`. given what i end up doing with the integral, it's nice to have an exact representation here. (2) what i need to do is evaluate the integral we've been talking about from `-inf` to `x`, where `x` varies from function call to function call from `-150` to `50`. i'm not sure how to do that. – abcd Aug 31 '15 at 04:30
  • ^ never mind. my two most recent comments are really a separate question having to do with the computation of a double integral. i'm going to try to work this out on my own, and, if i can't, post a new question. – abcd Aug 31 '15 at 04:42