How close does tcmalloc come to pure stack allocation performance?

Question

I was reasoning that if tcmalloc were maintaining a per-thread free list underneath from which dynamic allocations would be satisfied from then the performance of tcmalloc in the average case should be very close to stack allocation (the cost of resizing the pool is amortized over many operations).

Does this hold in actual practice? Are there de-generate cases I'm not thinking of?

A non-trivial benefit of the stack is that it's likely to be in CPU cache, and on an open page. (DDR RAM, despite it's name, in reality isn't entirely random access. It has real locality of reference) — MSalters, Jun 19 '17 at 17:32
If you want to use stack allocation, try alloca. Remember not to free the pointer. — Robert Jacobs, Jun 19 '17 at 17:49
I'd like to get the benefits of the longer object lifetime of dynamic memory though. If the thread-pool were managed behind the scenes, then this would be very convenient. It does sound almost too good to be true though. — Nathan Doromal, Jun 19 '17 at 17:53

score 1 · Answer 1 · answered Jun 19 '17 at 17:34

1

Stack allocation consists of a single machine instruction - change the stack pointer. It's hard to see how any other scheme can approach this efficiency. And you typically use stack allocation and dynamic allocation via malloc-like functions (which of course have a function call overhead) for different purposes, so the issue of which is "faster" is kind of moot.

answered Jun 19 '17 at 17:34

Actually, on Windows it doesn't always. You generally need one stack probe per additional stack page, so that Windows commits actual RAM. A good compiler might prove that all data access is sequential, in which that probe can be optimized, but that would be exceptional. – MSalters Jun 19 '17 at 18:22

How close does tcmalloc come to pure stack allocation performance?

1 Answers1