C++ class with nested expression templates

Question

I want to define a class, called Nested here, that will contains two or more (one here) data members that support arithmetic operations using expression templates, for example an std::valarray. For this class itself, I am defining its own expression templates and I want to "forward" the arithmetic operations down to the members.

A minimal (non)working example is given below:

#include <iostream>
#include <valarray>

template <typename E>
struct NestedExpr {
    operator const E& () const {
        return *static_cast<const E*>(this);
    }
};

template <typename A>
class Nested : public NestedExpr <Nested<A>>{
private:
    A a;
public:
    Nested(const A& _a) : a(_a) {}

    template <typename E>
    inline Nested<A>& operator = (const NestedExpr<E>& _expr) {
        const E& expr(_expr);
        a = expr.get_a();
        return *this;
    }

    inline       A& get_a()       { return a; }
    inline const A& get_a() const { return a; }
};

// ================================================================= //
template <typename ARG, typename S>
class NestedMul : public NestedExpr<NestedMul<ARG, S>> {
public:
    const ARG& arg;
    const S      s;
    NestedMul(const ARG& _arg, S _s) : arg(_arg), s(_s) {}
    inline auto get_a() const { return arg.get_a() * s; };
};

template< typename ARG, typename S>
inline NestedMul<ARG, S> operator * (S s, const NestedExpr<ARG>& arg) {
    return {arg, s};
}

// ================================================================= //
template <typename ARG1, typename ARG2>
class NestedAdd : public NestedExpr<NestedAdd<ARG1, ARG2>> {
public:
    const ARG1& arg1;
    const ARG2& arg2;
    NestedAdd(const ARG1& _arg1, const ARG2& _arg2)
        : arg1(_arg1), arg2(_arg2) {}
    inline auto get_a() const { return arg1.get_a() + arg2.get_a(); };
};

template<typename ARG1, typename ARG2>
inline NestedAdd<ARG1, ARG2> 
operator + (const NestedExpr<ARG1>& arg1, const NestedExpr<ARG2>& arg2) {
    return {arg1, arg2};
}

int main () {
    std::valarray<double> x1 = {4.0};
    std::valarray<double> x2 = {3.0};
    std::valarray<double> x3 = {0.0};
    std::valarray<double> x4 = {0.0};

    auto a = Nested<std::valarray<double>>(x1);
    auto b = Nested<std::valarray<double>>(x2);
    auto c = Nested<std::valarray<double>>(x3);

    // this returns 21
    c  = 2*a  + 3*b;
    std::cout << c.get_a()[0] << std::endl;

    // works as expected, returns 17
    x4 = 2*x1 + 3*x2;
    std::cout <<        x4[0] << std::endl;
}

The output of this program is

21
17

i.e. forwarding the expression down to the member does not seem to provide the expected result obtained directly from using the valarrays.

Any help here is appreciated.

I just noticed using double literals produces the right answer. — Davide, Nov 30 '18 at 20:06

Piotr Skotnicki · Accepted Answer · 2018-12-01T10:50:09.230

In the below function definition:

inline auto get_a() const { return arg.get_a() * s; };

your expected behavior is that auto deduces std::valarray<double>, that is, the result type of a multiplication of std::valarray<double> and int which is a new object that already stores values multiplied by the integer.

This is how operator* is defined [valarray.binary]/p2:

template <class T>
valarray<T> operator*(const valarray<T>&,
                      const typename valarray<T>::value_type&);

However, there's the following paragraph in the standard [valarray.syn]/p3:

Any function returning a valarray<T> is permitted to return an object of another type, provided all the const member functions of valarray<T> are also applicable to this type. This return type shall not add more than two levels of template nesting over the most deeply nested argument type.

This type must be convertible to std::valarray<double>, but itself, for optimization purposes, may not represent the actual result before that conversion happens.

That is, here's the actual type deduced for auto by GCC:

std::_Expr<std::__detail::_BinClos<std::__multiplies
                                 , std::_ValArray
                                 , std::_Constant, double, double>, double>

and here's what Clang uses:

std::__1::__val_expr<std::__1::_BinaryOp<std::__1::multiplies<double>, 
                     std::__1::valarray<double>, std::__1::__scalar_expr<double> > >

In other words, you are returning by value an object which probably defers the actual computations. In order to do so, those intermediate objects need to store somehow the deferred subexpressions.

Inspecting GCC libstdc++'s implementation, one can find the following representation:

template <class _Oper, class _FirstArg, class _SecondArg>
class _BinBase
{
public:
    typedef typename _FirstArg::value_type _Vt;
    typedef typename __fun<_Oper, _Vt>::result_type value_type;

    _BinBase(const _FirstArg& __e1, const _SecondArg& __e2)
    : _M_expr1(__e1), _M_expr2(__e2) {}

    // [...]

private:
    const _FirstArg& _M_expr1;
    const _SecondArg& _M_expr2;
};

Note that subexpressions are stored as references. This means that in the definition of get_a():

return arg1.get_a() + arg2.get_a();

_M_expr1 and _M_expr2 are bound to temporary objects:

arg1.get_a()
arg2.get_a()

i.e., intermediate objects which are the results of multiplications, whose lifetime ends as soon as NextedAdd::get_a() exits, leading to undefined behavior when the result is eventually computed, in particular, when the implementation attempts to access each individual element of that intermediate subexpressions:

value_type operator[](size_t __i) const
{
    return _Oper()(_M_expr1[__i], _M_expr2[__i]);
}

A quick solution would be to use the following return type:

std::decay_t<decltype(arg.get_a())> get_a() const { return arg.get_a() * s; }

This will recursively ensure that the final result type of any operation will be whatever the original type T in Nested<T> was, i.e., std::valarray<double>.

DEMO

Thanks. While this solves my problem, performance could still be improved. See timings (https://wandbox.org/permlink/N0GkSxnXsL5EZffO) — Davide, Dec 01 '18 at 11:45
@Davide that's why `valarray` defers computations. you'd need to ensure that `get_a` results live as long as your `Nested*` instances — Piotr Skotnicki, Dec 01 '18 at 11:49
Does this means that computations are not actually deferred and this code results in many temporary valarrays being constructed, populated and deleted? — Davide, Dec 01 '18 at 11:59
@Davide not really, you defer computations, but each subexpression eventually results in `valarray`, while `valarray` itself would defer all subexpressions to be evaluated only on final assignment — Piotr Skotnicki, Dec 01 '18 at 12:01
I see, thanks. Any ideas how to avoid that? I guess the code structure I currently have can't prevent that. — Davide, Dec 01 '18 at 12:03
@Davide you'd probably have to change the way `get_a` works. maybe change the bottom-up traversal to top-down ? — Piotr Skotnicki, Dec 01 '18 at 12:15
OK. I will accept your answer as it solves the original question (BTW any useful example of how to change the traversal?) — Davide, Dec 01 '18 at 12:25
@Davide in hindsight, I don't think that would be feasible, because those intermediate classes are not default constructible. I can't think of any solution right now — Piotr Skotnicki, Dec 01 '18 at 12:34
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/184556/discussion-between-davide-and-piotr-skotnicki). — Davide, Dec 01 '18 at 12:36
@Davide [here's](https://wandbox.org/permlink/7iWOg4bqBvq3u0x7) an extremely ugly and risky solution — Piotr Skotnicki, Dec 01 '18 at 13:05

C++ class with nested expression templates

1 Answers1