C++ scope guard with zero overhead

Question

In C++ we can ensure foo is called when we exit a scope by putting foo() in the destructor of a local object. That's what I think of when I head "scope guard." There are plenty of generic implementations.

I'm wondering—just for fun—if it's possible to achieve the behavior of a scope guard with zero overhead compared to just writing foo() at every exit point.

Zero overhead, I think:

{
  try {
    do_something();
  } catch (...) {
    foo();
    throw;
  }
  foo();
}

Overhead of at least 1 byte to give the scope guard an address:

{
  scope_guard<foo> sg;
  do_something();
}

Do compilers optimize away giving sg an address?

A slightly more complicated case:

{
  Bar bar;
  try {
    do_something();
  } catch (...) {
    foo(bar);
    throw;
  }
  foo(bar);
}

versus

{
  Bar bar;
  scope_guard<[&]{foo(bar);}> sg;
  do_something();
}

The lifetime of bar entirely contains the lifetime of sg and its held lambda (destructors are called in reverse order) but the lambda held by sg still has to hold a reference to bar. I mean for example int x; auto l = [&]{return x;}; gives sizeof(l) == 8 on my 64-bit system.

Is there maybe some template metaprogramming magic that achieve the scope_guard sugar without any overhead?

`sg` is allocated with automatic storage duration - which is most likely going to be on the stack. Are you running into some stack overflow issue or why are you looking for such an optimization? — UnholySheep, Nov 18 '21 at 21:04
@UnholySheep It's just for fun, not engineering. Of course I don't care about 4 or 8 bytes on the stack almost ever. — nebuch, Nov 18 '21 at 21:04
*"Overhead of at least 1 byte to give the scope guard an address"* - Looks like a template, and therefore almost certainly inlined. What evidence have you seen of the 1 byte being allocated? — StoryTeller - Unslander Monica, Nov 18 '21 at 21:07
@StoryTeller-UnslanderMonica That's part of what I'm asking—how can I check if it's inlined in a particular case? Are there profiling tools? Do I need to learn to interpret generated assembly? — nebuch, Nov 18 '21 at 21:11
I wonder if the first "_Zero overhead, I think:_" version isn't the most expensive version since it needs to "install" an extra exception handler. Pure speculation though ... — Ted Lyngmo, Nov 18 '21 at 21:12
I typically just test my hypothesis on https://www.godbolt.org and then trust my compiler. — StoryTeller - Unslander Monica, Nov 18 '21 at 21:12
Assuming the code doesn't introduce any undefined/unspecified behaviour, the compiler can do what it likes, as long as the program produces the required observable behaviour (e.g. given a set of inputs, produce the correct outputs). That can include optimising an object out of existence, so not using memory to store that object. If the code *does* introduce unspecified/undefined behaviour, there are even less restrictions on what the compiler does (with undefined behaviour, the compiler can ignore the situation, the program can terminate or reformat your hard drive). — Peter, Nov 18 '21 at 21:17
If you are genuinely concerned about overflowing your call stack by 1 byte, you may also be interested in [`[[no_unique_address]]`](https://en.cppreference.com/w/cpp/language/attributes/no_unique_address). — Drew Dormann, Nov 18 '21 at 21:23
"Do I need to learn to interpret generated assembly?" Basically, yes. I can't think of any other tool that would answer this question as accurately or positively. I suppose you could read the source code of your compiler instead, but that seems a lot harder. — Nate Eldredge, Nov 18 '21 at 21:27
A general rule of thumb is that, unless a local variable has its address taken and passed outside the function, it need not actually occupy space in memory. The compiler may keep it in a register or optimize it entirely out of existence, and that's a pretty standard optimization for compilers to do whenever they can. — Nate Eldredge, Nov 18 '21 at 21:30

score 3 · Accepted Answer · answered Jan 20 '22 at 17:50

If by overhead you mean how much space is occupied by scope-guard variable then zero overhead is possible if functional object is compile-time value. I've coded small snippet to illustrate this:

Try it online!

#include <iostream>

template <auto F>
class ScopeGuard {
public:
    ~ScopeGuard() { F(); }
};

void Cleanup() {
    std::cout << "Cleanup func..." << std::endl;
}

int main() {
    {
        char a = 0;
        ScopeGuard<&Cleanup> sg;
        char b = 0;
        std::cout << "Stack difference "
            << int(&a - &b - sizeof(char)) << std::endl;
    }
    {
        auto constexpr f = []{
            std::cout << "Cleanup lambda..." << std::endl; };
        
        char a = 0;
        ScopeGuard<f> sg;
        char b = 0;
        std::cout << "Stack difference "
            << int(&a - &b - sizeof(char)) << std::endl;
    }
}

Output:


Stack difference 0
Cleanup func...
Stack difference 0
Cleanup lambda...

Code above doesn't create even a single byte on a stack, because any class variable that has no fields occupies on stack 0 bytes, this is one of obvious optimizations that is done by any compiler. Of course unless you take a pointer to such object then compiler is obliged to create 1-byte memory object. But in your case you don't take address to scoped guard.

You can see that there is not a single byte occupied by looking at Try it online! link above the code, it shows assembler output of CLang.

To have no fields at all scoped guard class should only use compile-time function object, like global function pointer of lambda without capture. This two kinds of objects are used in my code above.

In code above you can even see that I outputted stack difference of char variable before and after scoped guard variable to show that scoped guard actually occupies 0 bytes.

Lets go a bit further and make possibility to have non-compile-time values of functional objects.

For this again we create class with no fields, but now store all functional objects inside one shared vector with thread local storage.

Again as we have no fields in class and don't take any pointer to scoped guard object then compiler doesn't create not a single byte for scoped guard object on stack.

But instead single shared vector is allocated in heap. This way you can trade stack storage for heap storage if you're out of stack memory.

Also having shared vector will allow us to use as few memory as possible, because vector uses only as much memory as many there are nested blocks that use scoped guard. If all scoped guards are located sequentially in different blocks then vector will have just 1 element inside so using just few bytes of memory for all scoped guards that were used.

Why heap memory of shared vector is more economical memory-wise than stack-stored memory of scoped guard. Because in case of stack memory if you have several sequential blocks of guards:

void test() {
    {
        ScopeGuard sg(f0);
    }
    {
        ScopeGuard sg(f1);
    }
    {
        ScopeGuard sg(f2);
    }
}

then all 3 guards occupy tripple amount of memory on stack, because for each function like test() above compiler allocates stack memory for all used in function's variables, so for 3 guards it allocates tripple amount.

In case of shared vector test() function above will use just 1 vector's element, so vector will have size of 1 at most hence will use just single amount of memory to store functional object.

Hence if you have many non-nested scoped guards inside one function then shared vector will be much more economical.

Now below I present code snippet for shared-vector approach with zero fields and zero stack memory overhead. To remind, this approach allows to use non-compile-time functional objects unlike solution in part one of my answer.

Try it online!

#include <iostream>
#include <vector>
#include <functional>

class ScopeGuard2 {
public:
    static auto & Funcs() {
        thread_local std::vector<std::function<void()>> funcs_;
        return funcs_;
    }
    ScopeGuard2(std::function<void()> f) {
        Funcs().emplace_back(std::move(f));
    }
    ~ScopeGuard2() {
        Funcs().at(Funcs().size() - 1)();
        Funcs().pop_back();
    }
};

void Cleanup() {
    std::cout << "Cleanup func..." << std::endl;
}

int main() {
    {
        ScopeGuard2 sg(&Cleanup);
    }
    {
        auto volatile x = 123;
        auto const f = [&]{
            std::cout << "Cleanup lambda... x = "
                << x << std::endl;
        };

        ScopeGuard2 sg(f);
    }
}

Output:

Cleanup func...
Cleanup lambda... x = 123

score 1 · Answer 2 · answered Nov 18 '21 at 23:45

It's not exactly clear what you mean by 'zero overhead' here.

Do compilers optimize away giving sg an address?

Most likely modern mainstream compilers will do it when run in optimizing modes. Unfortunately, that's as much definite as it can get. It depends on the environment and has to be tested to be relied upon.

If the question is if there is a guaranteed way to avoid <anything> in the resulting assembly, the answer is negative. As @Peter said in the comment, compiler is allowed to do anything to produce the equivalent result. It may not ever call foo() at all, even if you write it there verbatim - when it can prove that nothing in the observed program behavior will change.

C++ scope guard with zero overhead

2 Answers2