Can Tail Call Optimization and RAII Co-Exist?

Question

I can't think of a true RAII language that also has tail call optimization in the specs, but I know many C++ implementations can do it as an implementation-specific optimization.

This poses a question for those implementations that do: given that destructors are invoked at the end of a automatic variable's scope and not by a separate garbage collection routine, doesn't it violate TCO's constraint that a recursive call must be the last instruction at the end of a function?

For example:-

#include <iostream>

class test_object {
public:
    test_object() { std::cout << "Constructing...\n"; }
    ~test_object() { std::cout << "Destructing...\n"; }
};

void test_function(int count);

int main()
{
    test_function(999);
}

void test_function(int count)
{
    if (!count) return;
    test_object obj;
    test_function(count - 1);
}

"Constructing..." would be written 999 times and then "Destructing..." another 999 times. Ultimately, 999 test_object instances would be automatically allocated before the unwind. But assuming an implementation had TCO, would 1000 stack frames exist or just 1?

Does the destructor after the recursive call collide with the defacto TCO implementation requirements?

Interesting though actually. I was going to say "of course it doesn't prevent TCO", but the more I look at it, the more I think it does... — Mooing Duck, Jul 22 '13 at 16:43
There is no tail call here (and check your code, it doesn't compile as is). Though in theory a good compiler could notice the pattern and turn this into 2 loops. — Marc Glisse, Jul 22 '13 at 16:44
Short answer: yes, deterministic destructors prevent TCO. Long answer: actually some compilers (such as LLVM I believe) implement a more permissive form of TCO and might tolerate more cases... — Matthieu M., Jul 22 '13 at 16:45
Yeah, it's kinda hard to tell. I was wondering at what stage of execution the destruction actually happens. I'd assumed it would happen just before the instruction pointer being put back and the stack frame removed, but surely that would happen after the last function statement? I think I may have to get a disassembler out... — Louis Jackman, Jul 22 '13 at 16:46
Thanks for the code fix. Serves me right for some inattentive last-minute class renaming... — Louis Jackman, Jul 22 '13 at 16:51
@ljackman: actually, returning in C++ is supposed to be a two-steps process, and destruction intervenes between those steps. So the timeline is: 1. "Callee" evaluate the return expression and put the return result in a specific "return slot" (may involve a copy/move), 2. "Callee" execute destructors, 3. "Caller" copy/move result from "return slot" into the variable (or temporary) in its own frame. Any two of the copy/move may be optimized away, but semantically the moment the destructors are scheduled is well defined. — Matthieu M., Jul 22 '13 at 17:13
You learn something new everyday... I suppose it depends whether the implementation allows TCO before or after step 2, given that a void function doesn't need to return anything anyway. — Louis Jackman, Jul 22 '13 at 17:17

Mikael Persson · Accepted Answer · 2013-07-22T17:30:29.423

Taken at face-value, it would certainly seem like RAII works against TCO. However, remember that there are a number of ways in which the compiler can "get away with it", so to speak.

The first and most obvious case is if the destructor is trivial, meaning that it is the default destructor (compiler-generated) and all the sub-objects have trivial destructors too, then the destructor is effectively non-existent (always optimized away). In that case, TCO can be performed as usual.

Then, the destructor could be inlined (it's code is taken and put directly in the function as opposed to being called like a function). In that case, it boils down to just having some "clean-up" code after the return statement. The compiler is allowed to re-order operations if it can determine that the end-result is the same (the "as-if" rule), and it will do so (in general) if the re-ordering leads to better code, and I would assume TCO is one of the considerations being applied by most compilers (i.e., if it can re-order things such that the code becomes suitable for TCO, then it will do it).

And for the rest of the cases, where the compiler cannot be "smart enough" to do it on its own, then it becomes the responsibility of the programmer. The presence of this automatic destructor call does make it a bit harder for the programmer to see the TCO-inhibiting clean-up code after the tail call, but it doesn't make any difference in terms of the ability of the programmer to make the function a candidate for TCO. For example:

void nonRAII_recursion(int a) {
  int* arr = new int[a];
  // do some stuff with array "arr"
  delete[] arr;
  nonRAII_recursion(--a);  // tail-call
};

Now, a naive RAII_recursion implementation might be:

void RAII_recursion(int a) {
  std::vector<int> arr(a);
  // do some stuff with vector "arr"
  RAII_recursion(--a);  // tail-call
};  // arr gets destroyed here, not good for TCO.

But a wise programmer can still see that this won't work (unless the vector destructor is inlined, which is likely in this case), and can rectify the situation easily:

void RAII_recursion(int a) {
  {
    std::vector<int> arr(a);
    // do some stuff with vector "arr"
  }; // arr gets destroyed here
  RAII_recursion(--a);  // tail-call
};

And I'm pretty sure you could demonstrate that there are essentially no cases where this kind of trick could not be used to ensure that TCO can be applied. So, RAII merely makes it a bit harder to see if TCO can be applied. But I think programmers that are wise enough to design TCO-capable recursive calls are also wise enough to see those "hidden" destructor calls that would need to be forced to occur before the tail-call.

ADDED NOTE: Look at it this way, the destructor hides away some automatic clean-up code. If you need the clean-up code (i.e., non-trivial destructor), you will need it whether you use RAII or not (e.g., C-style array or whatever). And then, if you want TCO to be possible, it must be possible to do the cleaning up before doing the tail-call (with or without RAII), and it is possible, then it is possible be force the RAII objects to be destroyed before the tail-call (e.g., by putting them inside an extra scope).

If the vector's destructor has an externally-visible side-effect, I think the C++ standard would require that they all be constructed, and then all be destructed in the reverse order, would it not? Knowing whether the destructor of `arr` could have any side-effects would require analysis of all code that *ever* receives a pointer or reference to `arr`. Maybe possible in some cases, but not all. — supercat, Jul 24 '13 at 16:40
@supercat: The standard specifies observable behavior (i.e., behaves "as if" destructed in reverse order), that's a bit of a Schrodinger's cat problem, as soon as you add code to make the order of execution observable, the standard guarantees the order, but when you remove that code, you can't be sure of the order of things you can't observe. — Mikael Persson, Jul 24 '13 at 17:56
@supercat: As for destructing the vector having "externally-visible side-effects", that's of course dependent on the compiler and the situation. It certainly cannot always be determined. Also, how smart the compiler is w.r.t. mem-allocations is also critical (e.g., does it see it as just a function-call with possible side-effects, does it consider changes to the heap as a visible side-effect, etc.), and compilers do have some freedom in that. I agree that clean-up code is hard to re-order safely. — Mikael Persson, Jul 24 '13 at 17:57
I would expect that most compilers would not be smart enough to do TCO to the second code snippet even with an inlined destructor, but haven’t tested it. The reason is that the compiler doesn’t know if `// do some stuff with vector "arr"` includes putting a pointer to `arr` (or a pointer to one of the elements of `arr`) in a place that the recursive call can find (`static` local variable, global variable, `a`, etc.). Even if the compiler knows this didn’t happen, I think the `delete[]` call in `arr`’s destructor is still not reorderable unless the compiler uses strict pointer safety. — Daniel H, Feb 22 '15 at 02:30
If the TCO is all for implementing the PTC (proper tail call) guarantee, which essentially requires O(1) space complexity of the nested active calls, then the cleanup calls can be deferred to the enclosing non-tail context and merged into one instance when the effects are _idempotent_: there is no difference between one call and more than one calls of the cleanup. Note the space consumption itself is not a visible side effect contributing to observable behaviors (in almost every language I've seen). This is still difficult to prove, though. — FrankHB, Mar 29 '21 at 20:21
Such idempotent cleanups in C++ including release of the space of the automatic storage (technically, not counted as "deallocation"), and calls of default (not user-defined) `::operator delete` calls (subject to deallocation merging since C++14) determined statically (as if `consteval`-able) with no leak of information of the arguments. There need more assumptions provided by the language spec for other cases, otherwise it is at least impossible to assume it is safe to ignore the side effects when the definition of the deallocation function is not visible in the C++ translation units. — FrankHB, Mar 29 '21 at 20:30
Also note not only the cleanup may violate the requirements of TCO. Conversions of the returned value can take effect even later than the cleanup calls. For example, when returning an lvalue from a function with an object type as its return type, there is at least a copy initialization (optionally with calls to some user-defined conversion functions). It is still sometimes possible to do TCO because the conversion can be idempotent, with or without the copy elision. — FrankHB, Mar 29 '21 at 20:36

score 5 · Answer 2 · answered Jul 22 '13 at 17:01

5

If the compiler perform the TCO then the order in which destructors are called is changed with respect to when it doesn't do TCO.

If the compiler can prove that this reordering doesn't matter (e.g. if the destructor is trivial) then as per the as-if rule it can perform TCO. However, in your example the compiler can't prove that and will not do TCO.

answered Jul 22 '13 at 17:01

Cassio Neri

19,583
7
46
68

1

In the case of RAII, it will never be able to prove that the reordering doesn't matter, because almost by definition, it does matter. – James Kanze Jul 22 '13 at 17:58
1

@James Kanze. Indeed. If we see RAII in the sense that the destructor *releases a resource* (which is the most accepted sense of the term) then it's impossible to do TCO. If we see RAII in a broader sense, meaning that destructors are called when leaving the current scope, then, in some cases, it's still possible to do TCO. – Cassio Neri Jul 22 '13 at 20:30
1

Destructors which do nothing are _not_ RAII. And in general, the presence of non-trivial constructors or destructors does inhibit TCO. – James Kanze Jul 23 '13 at 07:54
Additional note: as of C++14, the standard [explicitly allows merging of some *new-expression*s](http://www.eel.is/c++draft/expr.new#10) without the aid of as-if rules. So the case gets more interesting. – FrankHB Apr 03 '16 at 09:58

Can Tail Call Optimization and RAII Co-Exist?

2 Answers2

Linked