Have the ideas behind the Fast Delegate (et al) been used to optimize std::function?

Question

There have been proposals for C++ "delegates" which have lower overhead than boost::function:

Have any of those ideas been used to implement std::function, resulting in better performance than boost::function? Has anyone compared the performance of std::function vs boost::function?

I want to know this specifically for the GCC compiler and libstdc++ on Intel 64-bit architectures, but information on other compilers is welcome (such as Clang).

`std::function` is an interface, not an implementation. If you want to ask about VC++'s stdlib, libstdc++, or libc++ _specifically_ then that's a valid question, but as-is your question is overly broad — ildjarn, Jun 20 '12 at 18:57
@ildjarn: Read the last sentence of the question. He's asking about specific implementations, and most specifically about libstdc++. — abarnert, Jun 20 '12 at 18:58
@abarnert : How could I respond to it without reading it? Obviously I read it, and I find it overly broad. — ildjarn, Jun 20 '12 at 18:59
@ildjarn: I don't see how mentioning that information on other implementations is also welcome ruins an otherwise-valid question. You can always ignore that part and answer specifically about libstdc++ if you want. — abarnert, Jun 20 '12 at 19:04
@EmileCormier: So to clarify, you primarily want to know whether the current version of libstdc++ uses one of these implementations for std::function? Have you tried looking at the header files? — abarnert, Jun 20 '12 at 19:09
@abarnert : I thought this would be a question of general interest to C++ programmers. I think this question belongs on StackOverflow even if I can find the answer on my own. StackOverflow is more than a "fix my bug" site. :-) If nobody already knows the answer, I will investigate this myself and post my findings for the benefit of the community. — Emile Cormier, Jun 20 '12 at 19:13
@ildjarn: Do you propose that I repost this exact same question for VC++, libstdc++, and libc++? ;) — Emile Cormier, Jun 20 '12 at 19:21
From a quick check of the versions of libstdc++ and libc++ that I have (neither of which are completely up to date), they both seem to use the allocator to create storage for a member function pointer, and I think the key bit in all of those tricks is avoiding that allocation. — abarnert, Jun 20 '12 at 19:40
@abarnert : I think Fast Delegate and friends also try to optimize the overhead of invoking the delegate. I need to read through the articles again -- it's been a long time. — Emile Cormier, Jun 20 '12 at 19:46

Jonathan Wakely · Accepted Answer · 2012-06-20T21:56:21.963

28

In libstdc++'s std::function we use a union type that is suitably sized and aligned to store pointers, function pointers or pointers to member functions. We avoid a heap allocation for any function object that can be stored in that size and alignment, but only if it is "location invariant"

/**
 *  Trait identifying "location-invariant" types, meaning that the
 *  address of the object (or any of its members) will not escape.
 *  Also implies a trivial copy constructor and assignment operator.
 */

The code is based on the std::tr1::function implementation and that part hasn't changed significantly. I think that could be simplified using std::aligned_storage and could be improved by specializing the trait so that more types are identified as location invariant.

Invoking the target object is done without any virtual function calls, the type erasure is done by storing a single function pointer in the std::function which is the address of a function template specialization. All operations are done by calling that function template through the stored pointer and passing in an enum identifying what operation it is being asked to perform. This means no vtable and only a single function pointer needs to be stored in the object.

This design was contributed by the original boost::function author and I believe it is close to the boost implementation. See the Performance docs for Boost.Function for some rationale. That means it's pretty unlikely that GCC's std::function is any faster than boost::function, because it's a similar design by the same person.

N.B. our std::function doesn't support construction with an allocator yet, any allocations it needs to do will be done using new.

In response to Emile's comment expressing a desire to avoid a heap allocation for a std::function which holds a pointer to member function and an object, here's a little hack to do it (but you didn't hear it from me ;-)

struct A {
  int i = 0;
  int foo() const { return 0; }
};

struct InvokeA
{
  int operator()() const { return a->foo(); }
  A* a;
};

namespace std
{
  template<> struct __is_location_invariant<InvokeA>
  { static const bool value = true; };
}

int main()
{
  A a;
  InvokeA inv{ &a };

  std::function<int()> f2(inv);

  return f2();
}

The trick is that InvokeA is small enough to fit in the function's small object buffer, and the trait specialization says it's safe to store in there, so the function holds a copy of that object directly, not on the heap. This requires a to persist as long as the pointer to it persists, but that would be the case anyway if the function's target was bind(&A::foo, &a).

edited Jun 20 '12 at 21:56

answered Jun 20 '12 at 20:34

Jonathan Wakely

166,810
27
341
521

4

Wow, strait from the horse's mouth! If I constructed a `std::function` from this expression `std::bind(&Object::memberFunction, objectInstance)`, would that constitute as being "location invariant"? – Emile Cormier Jun 20 '12 at 20:57
If you pass a pointer and an enum, to me that implies there's some kind of `switch` statement in the implementation that switches according to the enum value. Is the value of the enum a constant that can be resolved at compile time? If not, then I don't see how this is more efficient than a vtable lookup. – Emile Cormier Jun 20 '12 at 21:02
4

No, that would not be location invariant, because the result of that bind expression would contain the pointer to member function _and_ a copy of your object! I plan to increase the size of the union so it could hold a pointer to member and a pointer, so that `bind(&O::f, &o)` would be OK, but that will be relatively tricky. – Jonathan Wakely Jun 20 '12 at 21:08
1

Yes, there's a switch statement. For a given call the enumerator used is known at compile-time. Calling the manager function through the function pointer will not be any faster than a virtual call, but that's not the only overhead. Avoiding a vtable is beneficial, and only having _one_ function to call for _all_ operations might have benefits over having several virtual functions. TBH I haven't benchmarked it, I assumed the original `boost::function` author knew what he was doing ;) – Jonathan Wakely Jun 20 '12 at 21:17
3

I think invoking a member function bound to an object instance via `std::function` is a pretty common use case, so I'd be very happy to see a `std::function` implementation that avoids heap allocation for that situation. – Emile Cormier Jun 20 '12 at 21:22
@ipapadop, if your lambda has no captures then converting the closure to a function pointer would make it get stored directly in the `std::function`, because function pointers are location invariant. If you have captures then no, there is no simple way. – Jonathan Wakely May 24 '13 at 11:06
@JonathanWakely, regarding your "hack", is there some kind of static assert we could issue to verify that `InvokeA` is small enough to fit in the function's small object buffer? – Emile Cormier Jun 20 '14 at 23:20
1

@EmileCormier, only by comparing it to the size of the buffer used by `std::function`, which you can find in the source code – Jonathan Wakely Jun 22 '14 at 19:08
5

I've just committed a change that makes the `__is_location_invariant` specialization above unnecessary, so GCC 5.0 won't allocate memory for `InvokeA` – Jonathan Wakely Oct 09 '14 at 19:08

score 4 · Answer 2 · answered Oct 21 '12 at 16:59

As noted in the comments, std::function is only an interface, and different implementations may do different things, but it's worth noting that the standard does actually have something to say about this matter. From 20.8.11.2.1/5 (which looks more like an IP address than a part of the standard):

Note: Implementations are encouraged to avoid the use of dynamically allocated memory for small callable objects, for example, where f’s target is an object holding only a pointer or reference to an object and a member function pointer. —end note

This is the standard's way of encouraging implementers to employ the "small function optimization," which was motivated by the cited articles on delegates. (The articles themselves don't actually talk about delegates in the .NET sense. Rather, they use the term "delegate" to mean bound member functions.)

Have the ideas behind the Fast Delegate (et al) been used to optimize std::function?

2 Answers2

Linked