Double antiderivative computation in python

Question

I have the following problem. I have a function f defined in python using numpy functions. The function is smooth and integrable on positive reals. I want to construct the double antiderivative of the function (assuming that both the value and the slope of the antiderivative at 0 are 0) so that I can evaluate it on any positive real smaller than 100.

Definition of antiderivative of f at x:

integrate f(s) with s from 0 to x

Definition of double antiderivative of f at x:

integrate (integrate f(t) with t from 0 to s) with s from 0 to x

The actual form of f is not important, so I will use a simple one for convenience. But please note that even though my example has a known closed form, my actual function does not.

import numpy as np

f = lambda x: np.exp(-x)*x

My solution is to construct the antiderivative as an array using naive numerical integration:

N = 10000
delta = 100/N

xs = np.linspace(0,100,N+1)

vs = f(xs)
avs = np.cumsum(vs)*delta
aavs = np.cumsum(avs)*delta

This of course works but it gives me arrays instead of functions. But this is not a big problem as I can interpolate aavs using a spline to get a function and get rid of the arrays.

from scipy.interpolate import UnivariateSpline

aaf = UnivariateSpline(xs, aavs)

The function aaf is approximately the double antiderivative of f.

The problem is that even though it works, there is quite a bit of overhead before I can get my function and precision is expensive.

My other idea was to interpolate f by a spline and take the antiderivative of that, however this introduces numerical errors that are too big for what I want to use the function.

Is there any better way to do that? By better I mean faster without sacrificing accuracy.

Edit: What I hope is possible is to use some kind of Fourier transform to avoid integrating twice. I hope that there is some convenient transform of vs that allows to multiply the values component-wise with xs and transform back to get the double antiderivative. I played with this a bit, but I got lost.

Edit: I figured out that by using the trapezoidal rule instead of a naive sum, increases the accuracy quite a bit. Using Simpson's rule should increase the accuracy further, but it's somewhat fiddly to do with numpy arrays.

Edit: As @user202729 rightfully complains, this seems off. The reason it seems off is because I have skipped some details. I explain here why what I say makes sense, but it does not affect my question.

My actual goal is not to find the double antiderivative of f, but to find a transformation of this. I have skipped that because I think it only confuses the matter.

The function f decays exponentially as x approaches 0 or infinity. I am minimizing the numerical error in the integration by starting the sum from 0 and going up to approximately the peak of f. This ensure that the relative error is approximately constant. Then I start from the opposite direction from some very big x and go back to the peak. Then I do the same for the antiderivative values.

Then I transform the aavs by another function which is sensitive to numerical errors. Then I find the region where the errors are big (the values oscillate violently) and drop these values. Finally I approximate what I believe are good values by a spline.

Now if I use spline to approximate f, it introduces an absolute error which is the dominant term in a rather large interval. This gets "integrated" twice and it ends up being a rather large relative error in aavs. Then once I transform aavs, I find that the 'good region' has shrunk considerably.

EDIT: The actual form of f is something I'm still looking into. However, it is going to be a generalisation of the lognormal distribution. Right now I am playing with the following family.

I start by defining a generalization of the normal distribution:

def pdf_n(params, center=0.0, slope=8):
    scale, min, diff = params
    if diff > 0:
        r = min
        l = min + diff
    else:
        r = min - diff
        l = min
    def retfun(m):
        x = (m - center)/scale
        E = special.expit(slope*x)*(r - l) + l
        return np.exp( -np.power(1 + x*x, E)/2 )
    return np.vectorize(retfun)

It may not be obvious what is happening here, but the result is quite simple. The function decays as exp(-x^(2l)) on the left and as exp(-x^(2r)) on the right. For min=1 and diff=0, this is the normal distribution. Note that this is not normalized. Then I define

g = pdf(params)
f = np.vectorize(lambda x:g(np.log(x))/x/area)

where area is the normalization constant.

Note that this is not the actual code I use. I stripped it down to the bare minimum.

Have you looked at scipy numerical integration methods? https://docs.scipy.org/doc/scipy/reference/tutorial/integrate.html — Learning is a mess, Dec 08 '21 at 15:14
@Learningisamess I did look at it, but I didn't find something that looks convenient — tst, Dec 08 '21 at 15:43
Do you know what level of accuracy and latency/computational complexity you are aiming for? — Learning is a mess, Dec 08 '21 at 15:47
I advice starting with quad, go for higher order if you need more accuracy. — Learning is a mess, Dec 08 '21 at 15:48
Numerical integration methods return definite integrals, values for a specific interval. Function like results require multiple calls. An alternative might be to use `sympy.integrate` to produce an expression that can be `lambdified`. (e.g. `(x + 2)*exp(-x)`) — hpaulj, Dec 08 '21 at 16:42
@Learningisamess I just tried scipy's integration methods and the reduction in speed is staggering. Once the function is called more than say 100 times, it's way slower — tst, Dec 08 '21 at 16:52
@hpaulj as I said, even though in my example the integrals have simple closed forms, my actual function does not. Thus any analytical attempt is bound to fail — tst, Dec 08 '21 at 16:54
Is the normalization constant 'delta' correct here?. I did a quick check with [wolfram](https://www.wolframalpha.com/input/?i=integral+x+exp%28-x%29+from+0+to+100) and your answer does not approach the correct values. I think you intended to multiply by bin widths, but you divided them instead. — D A, Dec 08 '21 at 17:30
@DAdams yes, `linspace(a,b,n+1)` divides the interval [a,b] in n intervals which means n+ points — tst, Dec 08 '21 at 17:35
@DAdams God! You are obviously right! I need to multiply by `delta`. — tst, Dec 08 '21 at 18:08
Something feels very wrong. If 10000 is enough precision for you, wouldn't the `cumsum` part be very fast? Then the bottleneck is at the `f` evaluation part, but you can't evaluate `f` less without sacrificing accuracy (you already mention spline interpolation which should be at least as accurate as linear interpolation, so when you mention it reduces accuracy I suspect that you're also doing something terribly wrong) — user202729, Dec 08 '21 at 18:17
@user202729 you are right to protest. I have edited the question to explain why I made this statement. — tst, Dec 08 '21 at 18:53
@DAdams saddly I cannot write latex here. My definition is the following: F is the antiderivative of f if F'=f and F(0)=0 That is I assume the starting point is 0. You can move the starting point and define the antiderivative the same way. Usual choices are 0 and -infinity — tst, Dec 08 '21 at 18:56
@DAdams I added the definition in the question. SO needs latex :( — tst, Dec 08 '21 at 19:03
Let's see. Why would you need to call scipy integration 100 times!? There is https://docs.scipy.org/doc/scipy/reference/generated/scipy.integrate.odeint.html which returns the value of the antiderivative over the whole range, so just call it 2 times? — user202729, Dec 09 '21 at 04:50
No, what I'm asking is that if you're doing the spline interpolation correctly, it should definitely be more accurate than linear/trapezoid. (in fact that *is* exactly what Simpson integration formula is doing) — anyway, no need to reimplement the thing with your own code, just use odeint, which already uses Simpson rule internally (with some flags). (correction: actually odeint can handle second-order differential perfectly well, see [python - Integrating a set of second order differential equations with Scipy - Stack Overflow](https://stackoverflow.com/questions/49547534/)) — user202729, Dec 09 '21 at 04:55
Looking at the code (the other half you didn't show, but I assume you do spline interpolation similarly), indeed you're doing something wrong — by default UnivariateSpline has a huge error margin (and it's quite general, so it would be slower than odeint's Simpson or similar) — user202729, Dec 09 '21 at 04:58
Since I have been properly [nerd-sniped](https://xkcd.com/356/) on this one: can you give us an actual example of the closed form version of ```f``` ? — D A, Dec 09 '21 at 18:07
@DAdams sorry about that :) I have added the actual `f` I use at the moment. I am still trying to figure out what is the most convenient family and it seems that it will have to be changed :/ — tst, Dec 10 '21 at 11:49
I would definitely use [Laplaces Method](https://en.wikipedia.org/wiki/Laplace%27s_method) integration trick. You have ```exp( hard stuff )``` which is exactly what the trick was designed for. However - I don't know if there is more generalized versions which allow for arbitrary maximum error on the final result. The nice thing is that you can do it by hand, and the computer speed is irrelevant. (computation tricks won't get you to ```O(1)```) — D A, Dec 10 '21 at 17:00
Ahah - you can [keep higher order terms with Laplaces Method](https://www2.ph.ed.ac.uk/~mevans/amm/lecture04.pdf). This will both let you get to desired maximum error, AND allow you to take the second integral. It will take you a while to figure out the math. I suggest [sympy](https://www.sympy.org/en/index.html) as a math tool. Once you can print out a closed form of your approximation one time you should be all set for almost instant computation tho. — D A, Dec 10 '21 at 17:10
@DAdams that is very promising! Thank you! I will take a proper look at that at some point next week. If I manage to solve my problem this way, I will post the solution — tst, Dec 10 '21 at 17:47

score 2 · Answer 1 · answered Dec 08 '21 at 22:35

You can compute the two np.cumsum (and the divisions) at once more efficiently using Numba. This is significantly faster since there is no need for several temporary arrays to be allocated, filled, read again and freed. Here is a naive implementation:

import numba as nb

@nb.njit('float64[::1](float64[::1], float64)')  # Assume vs is contiguous
def doubleAntiderivative_naive(vs, delta):
    res = np.empty(vs.size, dtype=np.float64)
    sum1, sum2 = 0.0, 0.0
    for i in range(vs.size):
        sum1 += vs[i] * delta
        sum2 += sum1 * delta
        res[i] = sum2
    return res

However, the sum is not very good in term of numerical stability. A Kahan summation is needed to improve the accuracy (or possibly the alternative Kahan–Babuška-Klein algorithm if you are paranoid about the accuracy and performance do not matter so much). Note that Numpy use a pair-wise algorithm which is quite good but far from being prefect in term of accuracy (this is a good compromise for both performance and accuracy).

Moreover, delta can be factorized during in the summation (ie. the result just need to be premultiplied by delta**2).

Here is an implementation using the more accurate Kahan summation:

@nb.njit('float64[::1](float64[::1], float64)')
def doubleAntiderivative_accurate(vs, delta):
    res = np.empty(vs.size, dtype=np.float64)
    delta2 = delta * delta
    sum1, sum2 = 0.0, 0.0
    c1, c2 = 0.0, 0.0

    for i in range(vs.size):
        # Kahan summation of the antiderivative of vs
        y1 = vs[i] - c1
        t1 = sum1 + y1
        c1 = (t1 - sum1) - y1
        sum1 = t1

        # Kahan summation of the double antiderivative of vs
        y2 = sum1 - c2
        t2 = sum2 + y2
        c2 = (t2 - sum2) - y2
        sum2 = t2

        res[i] = sum2 * delta2

    return res

Here is the performance of the approaches on my machine (with an i5-9600KF processor):

Numpy cumsum:    51.3 us
Naive Numba:     11.6 us
Accutate Numba:  37.2 us

Here is the relative error of the approaches (based on the provided input function):

Numpy cumsum:       1e-13
Naive Numba:        5e-14
Accutate Numba:     2e-16
Perfect precision:  1e-16  (assuming 64-bit numbers are used)

If f can be easily computed using Numba (this is the case here), then vs[i] can be replaced by calls to f (inlined by Numba). This helps to reduce the memory consumption of the computation (N can be huge without saturating your RAM).

As for the interpolation, the splines often gives good numerical result but they are quite expensive to compute and AFAIK they require the whole array to be computed (each item of the array impact all the spline although some items may have a negligible impact alone). Regarding your needs, you could consider using Lagrange polynomials. You should be careful when using Lagrange polynomials on the edges. In your case, you can easily solve the numerical divergence issue on the edges by extending the array size with the border values (since you know the derivative on each edges of vs is 0). You can apply the interpolation on the fly with this method which can be good for both performance (typically if the computation is parallelized) and memory usage.

That's very helpful. Thank you! Regarding interpolation, I ended up using a B-spline with 20 knots, because the "good region" is nice and smooth. So evaluation is not a bottleneck once the spline is constructed. — tst, Dec 10 '21 at 11:37

D A · Answer 2 · 2021-12-08T22:49:45.757

First, I created a version of the code I found more intuitive. Here I multiply cumulative sum values by bin widths. I believe there is a small error in the original version of the code related to the bin width issue.

import numpy as np
f = lambda x: np.exp(-x)*x
N = 1000
xs = np.linspace(0,100,N+1)

domainwidth = ( np.max(xs) - np.min(xs) )
binwidth = domainwidth / N
vs = f(xs)
avs = np.cumsum(vs)*binwidth
aavs = np.cumsum(avs)*binwidth

Next, for visualization here is some very simple plotting code:

import matplotlib
import matplotlib.pyplot as plt
plt.figure()
plt.scatter( xs, vs )
plt.figure()
plt.scatter( xs, avs )
plt.figure()
plt.scatter( xs, aavs )
plt.show()

The first integral matches the known result of the example expression and can be seen on wolfram

Below is a simple function that extracts an element from the second derivative. Note that int is a bad rounding function. I assume this is what you have implemented already.

def extract_double_antideriv_value(x):
    return aavs[int(x/binwidth)]
singleresult = extract_double_antideriv_value(50.24)
print('singleresult', singleresult)

Whatever full computation steps are required, we need to know them before we can start optimizing. Do you have a million different functions to integrate? If you only need to query a single double anti-derivative many times, your original solution should be fairly ideal.

Symbolic Approximation: Have you considered approximations to the original function f, which can have closed form integration solutions? You have a limited domain on which the function lives. Perhaps approximate f with a Taylor series (which can be constructed with known maximum error) then integrate exactly? (consider Pade, Taylor, Fourier, Cheby, Lagrange(as suggested by another answer), etc...)

Log Tricks: Another alternative to dealing with spiky errors, would be to take the log of your original function. Is f always positive? Is the integration error caused because the neighborhood around the max is very small? If so, you can study ln(f) or even ln(ln(f)) instead. It would really help to understand what f looks like more.

Approximation Integration Tricks There exist countless integration tricks in general, which can make approximate closed form solutions to undo-able integrals. A very common one when exponetnial functions are involved (I think yours is expoential?) is to use Laplace's Method. But which trick to pull out of the bag is highly dependent upon the conditions which f satisfies.

Double antiderivative computation in python

2 Answers2