Why haven't modern compilers removed the need for expression templates?

Question

The standard pitch for expression templates in C++ is that they increase efficiency by removing unnecessary temporary objects. Why can't C++ compilers already remove these unnecessary temporary objects?

This is a question that I think I already know the answer to but I want to confirm since I couldn't find a low-level answer online.

Expression templates essentially allow/force an extreme degree of inlining. However, even with inlining, compilers cannot optimize out calls to operator new and operator delete because they treat those calls as opaque since those calls can be overridden in other translation units. Expression templates completely remove those calls for intermediate objects.

These superfluous calls to operator new and operator delete can be seen in a simple example where we only copy:

#include <array>
#include <vector>

std::vector<int> foo(std::vector<int> x)
{
    std::vector<int> y{x};
    std::vector<int> z{y};
    return z;
}

std::array<int, 3> bar(std::array<int, 3> x)
{
    std::array<int, 3> y{x};
    std::array<int, 3> z{y};
    return z;
}

In the generated code, we see that foo() compiles to a relatively lengthy function with two calls to operator new and one call to operator delete while bar() compiles to only a transfer of registers and doesn't do any unnecessary copying.

Is this analysis correct?

Could any C++ compiler legally elide the copies in foo()?

newest c++ standards (>=c++14,if I recall correctly) allow new/delete calls can be optimized-out/reused under certain conditions. So foo() *may* be legally optimized to something equivalent to bar() nowadays ... — Massimiliano Janes, Dec 07 '17 at 08:03
[[expr.new#10](https://timsong-cpp.github.io/cppwp/expr.new#10)] and below ... — Massimiliano Janes, Dec 07 '17 at 08:19
regarding expression templates, IMO their goal is not much about inlining per se, it's rather exploiting the symmetries of the type system of the domain specific language the expression models. For example, when you multiply three, say, hermitian matrices, an expression template can use a space-time optimized algorithm tailored for this use case; compilers don't know algebra ... :) — Massimiliano Janes, Dec 07 '17 at 08:25
I don't think you know what [expression templates](https://en.wikipedia.org/wiki/Expression_templates) are. You have nothing of the sort in the code you've shown. — bolov, Dec 07 '17 at 09:16
@bolov My example is intentionally minimal. The wikipedia example also uses `std::vector<>` internally. I don't think my example would gain anything by adding classes that wrap `std::vector<>` and use expression templates. The example generates the unnecessary copying that expression templates aim to remove (although `std::array<>` isn't a great comparison). — Praxeolitic, Dec 07 '17 at 09:26
@PraxeoliticI still don't see where expression templates come into play in your example. Yes, expression templates do aim at remove temp objects, but more than that aim at removing intermediate operations. **But** that only within one full expression. They have nothing to do with your example. [cont] — bolov, Dec 07 '17 at 09:39
E.g. a simple code that shows expression templates is this: `Matrix a, b, c, d = ....;` `Matrix r = a + b * c + d`; Expression templates come into play in the expression `a + b * c + d` and what this technique does is that instead of computing `b*c` and then `a + temp1` and then `temp2 + d` it delays the arithmetic computation until the assignment. It does this by creating with templates an AST object known at compile time for the expression `a + b * c + d` — bolov, Dec 07 '17 at 09:39
@bolov [Here](https://godbolt.org/g/Yq8BRV) is a new version of my example that uses simple "expression templates". The generated code has one less call to both `operation new` and `operator delete` compared to the previous version. I do know that my code doesn't look like a typical example of expression templates that implements arithmetic operators but my goal is to give a minimal example that causes unnecessary copying. If you don't like this example, please be specific. I believe that arithmetic operators are beside the point and it would be a distraction to add them. — Praxeolitic, Dec 07 '17 at 10:11
@Praxeolitic you still don't use expression templates. Ok, I could be wrong or I could misinterpret you. But as far as I see you are considering eliding copies equivalent with expression templates. Read again the first paragraph in the wiki. I will end the discussion here. — bolov, Dec 07 '17 at 10:20

Massimiliano Janes · Accepted Answer · 2017-12-07T08:56:45.667

However, even with inlining, compilers cannot optimize out calls to operator new and operator delete because they treat those calls as opaque since those calls can be overridden in other translation units.

since c++14, this is no more true, allocation calls can be optimized-out/reused under certain conditions:

[expr.new#10] An implementation is allowed to omit a call to a replaceable global allocation function. When it does so, the storage is instead provided by the implementation or provided by extending the allocation of another new-expression.[conditions follows]

So foo() may be legally optimized to something equivalent to bar() nowadays ...

Expression templates essentially allow/force an extreme degree of inlining

IMO the point of expression templates is not much about inlining per se, it's rather exploiting the symmetries of the type system of the domain specific language the expression models.

For example, when you multiply three, say, hermitian matrices, an expression template can use a space-time optimized algorithm exploiting the fact that the product is associative and that hermitian matrices are adjoint-symmetric, resulting in a reduction of total operation count (and possibly even better accuracy). And all this, occurs at compile time.

Conversely, a compiler cannot know what an hermitian matrix is, it's constrained evaluating the expression the brute way (according to your implementation floating point semantics).

score 3 · Answer 2 · answered Dec 07 '17 at 09:11

There are two kinds of expression templates.

One kind is about domain specific languages embedded directly in C++. Boost.Spirit turns expressions into recursive descent parsers. Boost.Xpressive turns them into regular expressions. Good old Boost.Lambda turns them into function objects with argument placeholders.

Obviously there is nothing a compiler can do to get rid of that need. It would take special-purpose language extensions to add the capabilities that the eDSL adds, like lambdas were added to C++11. But it's not productive to do that for every eDSL written; it would make the language gigantic and impossible to comprehend, among other problems.

The second kind is the expression templates that keep high-level semantics the same but optimize execution. They apply domain-specific knowledge to transform expressions into more efficient execution paths, while keeping the semantics the same. A linear algebra library might do that as Massimiliano explained in his answer, or a SIMD library like Boost.Simd might translate multiple operations into a single fused operation like multiply-add.

These libraries provide services that a compiler could, in theory, perform without modifying the language specification. However, in order to do so, the compiler would have to recognize the domain in question and have all the built-in domain knowledge to do the transformations. This approach is way too complex and would make compilers huge and even slower than they are.

An alternative approach to expression templates for these kinds of libraries would be compiler plugins, i.e. instead of writing a special matrix class that has all the expression template magic, you write a plugin for the compiler that knows about the matrix type and transforms the AST that the compiler uses. The problem with this approach is that either compilers would have to agree on a plugin API (not going to happen, they work too differently internally) or the library author has to write a separate plugin for every compiler they want their library to be usable (or at least performant) with.

Why haven't modern compilers removed the need for expression templates?

2 Answers2