4

I am writing a function f to be used in a Runge Kutta integrator.

output RungeKutta(function f, initial conditions IC, etc.)

Since the function will be called many times, I am looking for a way to generate the function f at compile time.

In this case, function f depends on a fixed list of parameters vector p, where p is sparse and is fixed before the code is compiled. To be concrete,

double function f(vector<double> x) {
    return x dot p;
}

Since p is sparse, taking the dot product in f is not the most efficient. Hard-coding x dot p seems to be the way to go, but p can be very long (1000).

What are my options?

Is writing another program (taking p as input) to generate a .cpp file my only option?

Thanks for the comments. Here is a more concrete example for the differential equation.

dy/dx = f_p(x)

One example for f_p(x):

p = [0, 1, 0]; x = [x1, x2, x3]

double f_p(vector<double> x) {
     return x2;  // This is what I meant by hard-coding
}

instead of:

   double f(vector<double> p, vector<double> x) {
       double r = 0;
       for (i=0; i < p.length(); i++) {
              r += p[i]*x[i];
        }
        return r;
   }
  • The syntax `` is new to me. Assuming it is some sort of non-trivial object you probably want to pass it by [`const`] reference for starts. How is `p` specified? – Dietmar Kühl Jan 01 '14 at 01:18
  • Sorry, I was trying to write pseudo code. p is just a list of numbers fixed ahead of time. – user3150434 Jan 01 '14 at 01:22
  • Could you provide a more complete example? I think you *can* do this with metaprogramming, but maybe there's a simpler way. – dyp Jan 01 '14 at 01:24
  • If `p` is sparse, how can it be just a list of numbers? ... or are you saying that potentially many of the values are actually just zeros? If that's the case what percentage is normally zero? – Dietmar Kühl Jan 01 '14 at 01:29
  • I am trying to numerically integrate the Optical Bloch Equation. It is a first order differential equations over a matrix. Since most Runge Kutta integrators only take vector instead of a matrix as input arguments. I need a program to transform the matrix into a vector. The function f above is the term involving the Hamiltonian, which is fixed before integration. – user3150434 Jan 01 '14 at 01:30
  • approximately 10% of p is nonzero. p is actually a matrix. I simplified my problem into a vector. – user3150434 Jan 01 '14 at 01:31
  • I studied math and have a vague idea what you are talking about but you'll need to translate this into more software terms. ... and note that simplifying things when you want an efficient solution may interfere severely with what can be done. – Dietmar Kühl Jan 01 '14 at 01:32
  • Can the example above understood by a software engineers now? – user3150434 Jan 01 '14 at 01:39
  • 1
    uh, I would like to understand BOTH the question and the answer. are you saying, that you want a function which recognizes an operation that can be replaced with another where the later is computationally cheaper, and performs the same in order to return the result thereof? – Sean Jan 01 '14 at 02:25

3 Answers3

2

The key problem you are trying to solve is that a "leaf" function in your calculation that will be called many times will also most often do no work given the problem domain. The hope is that the redundant work - namely multiplying a value with an element of an array known at compile time to be zero - can be collapsed as part of a compile time step.

C++ has language facilities to deal with this, namely template metaprogramming. C++ templates are very powerful (ie Turing complete) and allow for things like recursive calculations based on compile time constants.

Below is an example of how to implement your example using templates and template specialization (you can also find a runnable example I've created here http://ideone.com/BDtBt7). The basic idea behind the code is to generate a type with a static function that returns the resulting dot product of an input vector of values and a compile time constant array. The static function recursively calls instances of itself, passing a lower index value as it moves through the input/constant arrays of elements. It is also templated with whether the value in the compile time constant array p that is being evaluated is zero. If it is, we can skip calculating that and move onto the next value in the recursion. Lastly, there is a base case that stops the recursion once we have reached the first element in the array.

#include <array>
#include <iostream>
#include <vector>

constexpr std::array<double, 5> p = { 1.0, 0.0, 3.0, 5.0, 0.0 };

template<size_t index, bool isZero>
struct DotProductCalculator
{
    static double Calculate(const std::vector<double>& xArg)
    {
        return (xArg[index] * p[index]) 
            + DotProductCalculator<index - 1, p[index - 1] == 0.0>::Calculate(xArg);
    }
};

template<>
struct DotProductCalculator<0, true>
{
    static double Calculate(const std::vector<double>& xArg)
    {
        return 0.0;
    }
};

template<>
struct DotProductCalculator<0, false>
{
    static double Calculate(const std::vector<double>& xArg)
    {
        return xArg[0] * p[0];
    }
};

template<size_t index>
struct DotProductCalculator<index, true>
{
    static double Calculate(const std::vector<double>& xArg)
    {
        return 0.0 + DotProductCalculator<index - 1, p[index - 1] == 0.0>::Calculate(xArg);
    }
};

template<typename ArrayType>
double f_p_driver(const std::vector<double>& xArg, const ArrayType& pAsArgument) 
{
     return DotProductCalculator<std::tuple_size<ArrayType>::value - 1, 
                            p[std::tuple_size<ArrayType>::value -1] == 0.0>::Calculate(xArg); 
}

int main() 
{
    std::vector<double> x = { 1.0, 2.0, 3.0, 4.0, 5.0 };
    double result = f_p_driver(x, p);
    std::cout << "Result: " << result;
    return 0;
}
Sean Boocock
  • 131
  • 1
  • 5
  • Is there a reason for `0.0 +` in the line `return 0.0 + DotProductCalculator::Calculate(xArg);`? Also I guess this is only improving performance if all calls are inlined by the compiler, which is not guaranteed. Do you know whether current compilers will usually do? –  Jan 01 '14 at 03:58
  • The reason for `0.0 +` in that line is that it is part of template specialization for the case where the value in the p array is known to be zero. Since it is zero, it is irrelevant what the value for the element in the input array is and I skip evaluating it, replacing the multiplication with `0.0 +`. You are correct about inlining being necessary for performance here and to the best of my knowledge all modern C++ compilers (MSVC, GCC, Clang, ICC) will inline these calls. – Sean Boocock Jan 01 '14 at 04:06
  • The speed if your approach comes solely from the fact that `p` is declared `constexpr`; the metaprogramming only serves to make the function less generic. [This program @coliru demonstrates there is no statistically significant speedup vs. simply using `std::inner_product`](http://coliru.stacked-crooked.com/a/5822eba4b6038f84). – Casey Jan 01 '14 at 06:31
  • @Casey Your test is kind of unfair, since the premise of the question was a sparse array with only 10% non-zero values and a vector length of 1000. I made a test-case: http://coliru.stacked-crooked.com/a/156fd1944302c508 –  Jan 01 '14 at 06:54
1

You say in the comments that P really is a row or column of a matrix, and that the matrix is sparse. I'm not familiar with the specific physical problem you are solving, but often, sparse matrices have a fixed diagonal "banding" structure of some kind, e.g.:

| a1 b1  0  0  0  0  0 d1 |
| c1 a2 b2  0  0  0  0  0 |
| 0  c2 a3 b3  0  0  0  0 |
| 0  0  c3 a4 b4  0  0  0 |
| 0  0  0  c4 a5 b5  0  0 |
| 0  0  0  0  c5 a6 b6  0 |
| 0  0  0  0  0  c6 a7 b7 |
| e1 0  0  0  0  0  c7 a8 |

The most efficient way to store such matrices tends to be to store the diagonals as arrays/vectors, so:

A = [a1, a2, a3, a4, a5, a6, a7, a8]
B = [b1, b2, b3, b4, b5, b6, b7]
C = [c1, c2, c3, c4, c5, c6, c7]
D = [d1]
E = [e1]

Multiplying a row-vector X = [x1, x2, x3, x4, x5, x6, x7, x8] by the above matrix thus becomes:

Y = X . M

Y[0] =               X[0] * A[0] + X[1] * C[0] + X[7] * E[0]
Y[1] = X[0] * B[0] + X[1] * A[1] + X[2] * C[1]

etc.

or more generally:

Y[i] = X[i-7] * D[i] + X[i-1] * B[i] + X[i] * A[i] + X[i+1] * C[i] + X[i+7] * E[i]

Where out-of-range array accesses (< 0 or >= 8 should be treated as evaluating to 0. To avoid having to test for out-of-bounds everywhere, you can actually store each diagonal and the vector itself in oversize arrays which have leading and trailing elements filled with zeroes.

Note that this will also be highly cache efficient, as all array accesses are linear.

pmdj
  • 22,018
  • 3
  • 52
  • 103
1

With the given constraints I would create a custom function object which stores the matrix p and computes the operation in its function call operator. I would implement two versions of the function: one which preprocesses the matrix upon construction to "know" where the non-zero elements are and one which just does the operations as stated, accepting that many of the computations just result in 0. The quoted amount of 10% non-zero elements sounds likely to be too dense for the complication from taking advantage of the sparsity to pay off.

Ignoring that p is a matrix and using it as a vector, the version without preprocessing would be something like this:

class dotProduct {
    std::vector<double> p;
public:
    dotProduct(std::vector<double> const& p): p(p) {}
    double operator()(std::vector<double> const& x) const {
        return std::inner_product(p.begin(), p.end(), x.begin());
    }
};
// ...
... RungeKutta(dotProduct(p), initial conditions IC, etc.);

When using C++11 a lambda function could be used instead:

... RungeKutta([=](std::vector<double> const& x) {
        return std::inner_product(p.begin(), p.end(), x.begin());
    }, intitial conditions IC, etc.);

For the preprocessing version you'd store a std::vector<std::pair<double, std::size_t>> indicating which indices actually need to be multiplied:

class sparseDotProduct {
    typedef std::vector<std::pair<double, std::size_t>> Vector;
    Vector p;
public:
     sparsedotProduct(std::vector<double> const& op) {
         for (std::size_t i(0), s(op.size()); i != s; ++i) {
             if (op[i]) {
                 p.push_back(std::make_pair(op[i], i));
             }
         }
     }
     double operator()(std::vector<double> const& x) {
         double result(0);
         for (Vector::const_iterator it(p.begin()), end(p.end()); it != end; ++it) {
             result += it->first * x[it->second];
         }
         return result;
     }
};

The use of this function object is just the same although it may be reasonable to keep this object around if p doesn't change.

I would personally expect that the non-sparse version actually outperforms the sparse version if there are 10% non-zero values. However, with these two versions around it should be relatively simple to measure the performance of the different approaches. I wouldn't expect a custom created code to be substantially better although it could improve on the computation. If so, it may work to use meta programming techniques to create the code but I doubt that this would be too practical.

Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380