0

I am writing a parallelized version of the algorithm to calculate the weights and abscissae for the Gauss-Hermite quadrature (details of which can be found here or here).

I have the following algorithm to compute the values using GPU acceleration:

template <typename T>
ArrayPair<T> CalculateGaussHermiteWeights( const std::size_t sNumPoints, const T tEps = std::numeric_limits<T>::epsilon() )
{
    const T tPiToQuarter = T( 0.7511255444649425 );

    T tCurrentGuess, tFatherGuess, tGrandfatherGuess;
    std::vector<T> vecInitialGuesses( sNumPoints );
    for( std::size_t s = 0; s < (sNumPoints + 1U) / 2U; ++s )
    {
        if( s == 0 )
        {
            tCurrentGuess = sqrt( T( 2 * sNumPoints + 1 ) ) - T( 1.85575 ) * pow( T( 2 * sNumPoints + 1 ), T( -0.16667 ) );
        }
        else if( s == 1 )
        {
            tFatherGuess = tCurrentGuess;
            tCurrentGuess -= T( 1.14 ) * pow( T( sNumPoints ), T( 0.426 ) ) / tCurrentGuess;
        }
        else if( s == 2 )
        {
            tGrandfatherGuess = tFatherGuess;
            tFatherGuess = tCurrentGuess;
            tCurrentGuess = T( 1.86 ) * tCurrentGuess - T( 0.86 ) * tGrandfatherGuess;
        }
        else if( s == 3 )
        {
            tGrandfatherGuess = tFatherGuess;
            tFatherGuess = tCurrentGuess;
            tCurrentGuess = T( 1.91 ) * tCurrentGuess - T( 0.91 ) * tGrandfatherGuess;
        }
        else
        {
            tGrandfatherGuess = tFatherGuess;
            tFatherGuess = tCurrentGuess;
            tCurrentGuess = T( 2.0 ) * tCurrentGuess - tGrandfatherGuess;
        }

        vecInitialGuesses[s] = tCurrentGuess;
    }

    concurrency::array<T, 1> arrWeights( sNumPoints ), arrAbscissae( sNumPoints, std::begin( vecInitialGuesses ) );
    concurrency::parallel_for_each( arrAbscissae.extent, [=, &arrWeights, &arrAbscissae]( concurrency::index<1> index ) restrict( amp ) {
        T tVal = arrAbscissae[index], tIntermediate;

        std::size_t sNumIterations = 0U;
        T tPolynomial1 = tPiToQuarter, tPolynomial2 = T( 0.0 ), tPolynomial3, tDerivative;
        do {
            for( std::size_t s = 0; s < sNumIterations; ++s )
            {
                tPolynomial3 = tPolynomial2;
                tPolynomial2 = tPolynomial1;
                tPolynomial1 = tVal * concurrency::precise_math::sqrt( T( 2.0 ) / (s + 1U) ) * tPolynomial2 - concurrency::precise_math::sqrt( T( s ) / (s + 1U) ) * tPolynomial3;
            }

            tDerivative = concurrency::precise_math::sqrt( T( 2 * sNumPoints ) ) * tPolynomial2;
            tIntermediate = tVal;
            tVal = tIntermediate - tPolynomial1 / tDerivative;

        } while( concurrency::precise_math::fabs( tVal - tIntermediate ) < tEps && sNumIterations < 10 );

        arrAbscissae[index] = tVal;
        arrAbscissae[concurrency::index<1>( sNumIterations - 1U - index[0] )] = -tVal;

        arrWeights[index] = T( 2.0 ) / (tDerivative * tDerivative);
        arrWeights[concurrency::index<1>( sNumIterations - 1U - index[0] )] = arrWeights[index];
    } );

    return std::make_pair( std::move( arrAbscissae ), std::move( arrWeights ) );
}

However, if I call the algorithm and print out the abscissae and corresponding weights that I get (using n=5 for example), I get the following:

Abscissa                     Weight
2.02499                      0.0133645
0.958991                     0.363567
0.000186652                  0.958224
-0.957991                    0.363567
-2.02499                     0.0133645

However, the correct table would be:

2.02018                      0.0199532
0.958572                     0.393619
0                            0.945309
-0.958572                    0.393619
-2.02018                     0.0199532

As you can see, there is a (small, but definitely non-negligible) discrepancy between the two tables of values. At first I assumed it was due to the inaccuracy of sqrt, but this does not appear to be the case. I then guessed it was due to too few iterations in the Newton's method, however, increasing the number of iterations made no difference, so I'm guessing there is a flaw in the algorithm, however, I cannot find what it is.

Thanks in advance!

Thomas Russell
  • 5,870
  • 4
  • 33
  • 68
  • How do you get the correct values? – doctorlove Aug 23 '14 at 11:24
  • @doctorlove They are in a table in the second-link that I gave (i.e. the Wolfram Mathworld link) for n = {2,...,5}. – Thomas Russell Aug 23 '14 at 11:25
  • Where are the magic numbers like ` 1.85575` coming from and what is `T`? (float, double...?) – doctorlove Aug 23 '14 at 11:30
  • @doctorlove The magic numbers are to form basic guesses for the roots to the Hermite polynomial and came from Numerical Recipes 3rd Ed. `T` in this case was `double`, but `float` returns the same result. – Thomas Russell Aug 23 '14 at 11:32
  • To be honest it looks like you are quite close, and Wolfram may have used slightly more precise or slightly different values. Does your code give the same values as Numerical Recipes> – doctorlove Aug 23 '14 at 11:34
  • @doctorlove I believe it does give the same values as Numerical Recipes (however, I'm not sure that these values are not also erroneous); but I'm just somewhat suspicious because all of the other Numerical Recipes numerical integration algorithms give exact (to machine precision) correspondences to the analytic abscissae and weights. – Thomas Russell Aug 23 '14 at 11:38

0 Answers0