Fastest possible method for the arcsin function on small, arbitrary floating-point values

Question

I need to calculate the arcsine function of small values that are under the form of mpmath's "mpf" floating-point bignums.

What I call a "small" value is for example e/4/(10**7) = 0.000000067957045711476130884...

Here is a result of a test on my machine with mpmath's built-in asin function:

import gmpy2
from mpmath import *
from time import time

mp.dps = 10**6

val=e/4/(10**7)
print "ready"

start=time()
temp=asin(val)
print "mpmath asin: "+str(time()-start)+" seconds"

>>> 155.108999968 seconds

This is a particular case: I work with somewhat small numbers, so I'm asking myself if there is a way to calculate it in python that actually beats mpmath for this particular case (= for small values).

Taylor series are actually a good choice here because they converge very fast for small arguments. But I still need to accelerate the calculations further somehow.

Actually there are some problems:
1) Binary splitting is ineffective here because it shines only when you can write the argument as a small fraction. A full-precision float is given here.
2) arcsin is a non-alternating series, thus Van Wijngaarden or sumalt transformations are ineffective too (unless there is a way I'm not aware of to generalize them to non-alternating series). https://en.wikipedia.org/wiki/Van_Wijngaarden_transformation

The only acceleration left I can think of is Chebyshev polynomials. Can Chebyshev polynomials be applied on the arcsin function? How to?

For such small values `asin(x)` is very close to `x` (first term of the Taylor series). Is this not precise enough? — Pavel Anossov, Aug 14 '13 at 16:50
`math.asin(0.000000067957045711476130884) -> 6.795704571147619e-08` i.e. only the last digit is incorrect. Is an error in the order of `10^-22` important to you? — Bakuriu, Aug 14 '13 at 16:59
The short answer is no. If you work with 10^6 digits many other taylor terms are needed: for e/4/10^7 you still need something around 62500 series terms. @Bakuriu the number I used as the example is the one that mpmath prints. I don't think it is wrong. — user2464424, Aug 14 '13 at 17:01

score 3 · Answer 1 · answered Aug 14 '13 at 18:11

3

Can you use the mpfr type that is included in gmpy2?

>>> import gmpy2
>>> gmpy2.get_context().precision = 3100000
>>> val = gmpy2.exp(1)/4/10**7
>>> from time import time
>>> start=time();r=gmpy2.asin(val);print time()-start
3.36188197136

In addition to supporting the GMP library, gmpy2 also supports the MPFR and MPC multiple-precision libraries.

Disclaimer: I maintain gmpy2.

answered Aug 14 '13 at 18:11

casevh

11,093
1
24
35

Thx didn't knew that. It took me 30 secs to compute the arcsin at the same precision and that's quite impressive! How comes that mpmath is so slow compared to mpfr? Like 5 times slower... – user2464424 Aug 14 '13 at 19:01
The MPFR library is written entirely in C and directly access GMP for fast arithmetic. mpmath library can run as a pure Python library. It only uses gmpy/gmpy2 to access the mpz type as a fast replacement for Python longs. I checked the MPFR source and asin(x) is computed from atan(x/sqrt(1-x^2)). – casevh Aug 15 '13 at 03:43

score 2 · Accepted Answer · answered Aug 16 '13 at 15:12

Actually binary splitting does work very well, if combined with iterated argument reduction to balance the number of terms against the size of the numerators and denominators (this is known as the bit-burst algorithm).

Here is a binary splitting implementation for mpmath based on repeated application of the formula atan(t) = atan(p/2^q) + atan((t*2^q-p) / (2^q+p*t)). This formula was suggested recently by Richard Brent (in fact mpmath's atan already uses a single invocation of this formula at low precision, in order to look up atan(p/2^q) from a cache). If I remember correctly, MPFR also uses the bit-burst algorithm to evaluate atan, but it uses a slightly different formula, which possibly is more efficient (instead of evaluating several different arctangent values, it does analytic continuation using the arctangent differential equation).

from mpmath.libmp import MPZ, bitcount
from mpmath import mp

def bsplit(p, q, a, b):
    if b - a == 1:
        if a == 0:
            P = p
            Q = q
        else:
            P = p * p
            Q = q * 2
        B = MPZ(1 + 2 * a)
        if a % 2 == 1:
            B = -B
        T = P
        return P, Q, B, T
    else:
        m = a + (b - a) // 2
        P1, Q1, B1, T1 = bsplit(p, q, a, m)
        P2, Q2, B2, T2 = bsplit(p, q, m, b)
        T = ((T1 * B2) << Q2) + T2 * B1 * P1
        P = P1 * P2
        B = B1 * B2
        Q = Q1 + Q2
        return P, Q, B, T

def atan_bsplit(p, q, prec):
    """computes atan(p/2^q) as a fixed-point number"""
    if p == 0:
        return MPZ(0)
    # FIXME
    nterms = (-prec / (bitcount(p) - q) - 1) * 0.5
    nterms = int(nterms) + 1
    if nterms < 1:
        return MPZ(0)
    P, Q, B, T = bsplit(p, q, 0, nterms)
    if prec >= Q:
        return (T << (prec - Q)) // B
    else:
        return T // (B << (Q - prec))

def atan_fixed(x, prec):
    t = MPZ(x)
    s = MPZ(0)
    q = 1
    while t:
        q = min(q, prec)
        p = t >> (prec - q)
        if p:
            s += atan_bsplit(p, q, prec)
            u = (t << q) - (p << prec)
            v = (MPZ(1) << (q + prec)) + p * t
            t = (u << prec) // v
        q *= 2
    return s

def atan1(x):
    prec = mp.prec
    man = x.to_fixed(prec)
    return mp.mpf((atan_fixed(man, prec), -prec))

def asin1(x):
    x = mpf(x)
    return atan1(x/sqrt(1-x**2))

With this code, I get:

>>> from mpmath import *
>>> mp.dps = 1000000
>>> val=e/4/(10**7)
>>> from time import time
>>> start = time(); y1 = asin(x); print time() - start
58.8485069275
>>> start = time(); y2 = asin1(x); print time() - start
8.26498985291
>>> nprint(y2 - y1)
-2.31674e-1000000

Warning: atan1 assumes 0 <= x < 1/2, and the determination of the number of terms might not be optimal or correct (fixing these issues is left as an exercise to the reader).

That's what I'm talking about :) So if I got it well in order to calculate the arcsine fast you instead calculate some very optimized arctangent functions. Nowadays everything is calculated with arctangents... — user2464424, Aug 16 '13 at 15:42
It should be possible to construct an analogous algorithm for the arcsine directly, but the arctangent is often more convenient to work with (it has a simpler Taylor series, for one thing). A single algebraic transformation to translate between one function and another is cheap, anyhow. — Fredrik Johansson, Aug 16 '13 at 22:16

score 0 · Answer 3 · answered Aug 14 '13 at 17:03

0

A fast way is to use a pre-calculated look-up table.

But if you look at e.g. a Taylor series for asin;

def asin(x):
    rv = (x + 1/3.0*x**3 + 7/30.0*x**5 + 64/315.0*x**7 + 4477/22680.0*x**9 + 
         28447/138600.0*x**11 + 23029/102960.0*x**13 + 
         17905882/70945875.0*x**15 + 1158176431/3958416000.0*x**17 + 
         9149187845813/26398676304000.0*x**19)
    return rv

You'll see that for small values of x, asin(x) ≈ x.

In [19]: asin(1e-7)
Out[19]: 1.0000000000000033e-07

In [20]: asin(1e-9)
Out[20]: 1e-09

In [21]: asin(1e-11)
Out[21]: 1e-11

In [22]: asin(1e-12)
Out[22]: 1e-12

E.g. for the value us used:

In [23]: asin(0.000000067957045711476130884)
Out[23]: 6.795704571147624e-08

In [24]: asin(0.000000067957045711476130884)/0.000000067957045711476130884
Out[24]: 1.0000000000000016

Of course it depends on whether this difference is relevant to you.

answered Aug 14 '13 at 17:03

Roland Smith

42,427
3
64
94

Thank you. Unfortunately, working-out a number with one million digits requires way more taylor terms. Taylor IS actually very fast here, but since the numbers are 1/big, I'm looking for fancy ways to speed up taylor even further. – user2464424 Aug 14 '13 at 17:14
Hence a look-up table, so you only have to calculate once. – Roland Smith Aug 14 '13 at 17:15
Which is very good for fixed point calculations, so that you can interpolate between nodes to form a rough estimation of the result. Now, I expect the lookup table would be very big in order to be comprehensive for any number from 0 to 10^-7 up to 10^6 digits, isn't it? – user2464424 Aug 14 '13 at 17:49

Fastest possible method for the arcsin function on small, arbitrary floating-point values

3 Answers3