std::deque memory usage - Visual C++, and comparison to others

Question

Follow up to What the heque is going on with the memory overhead of std::deque?

Visual C++ manages deque blocks according to the container element type using this:

#define _DEQUESIZ   (sizeof (value_type) <= 1 ? 16 \
    : sizeof (value_type) <= 2 ? 8 \
    : sizeof (value_type) <= 4 ? 4 \
    : sizeof (value_type) <= 8 ? 2 \
    : 1)    /* elements per block (a power of 2) */

This results in very large memory footprint for small elements. By changing the 16 in the first line to 128 I was able to drastically reduce the footprint required for a large deque<char>. Process Explorer Private Bytes dropped from 181MB -> 113MB after 100m push_back(const char& mychar) calls).

Can anybody justify the values in that #define?
How do other compilers handle deque block sizing?
What would be their footprint (32-bit operation) for the simple test of 100m push_back calls to deque<char>?
Does STL allow for overriding of this block size at compile-time, without modifying the <deque> code?

Couldn't you write a custom allocator that allocates larger blocks? — Björn Pollex, Nov 04 '10 at 14:44
@Space_C0wb0y - yes, though I gather that's discouraged (Meyers et al). — Steve Townsend, Nov 04 '10 at 14:50
@Space_C0wb0y - custom allocator will probably help, but not completely solve this. Nevertheless Each deque block will have deque's variables in it, hence there'll be an overhead anyway. — valdo, Nov 04 '10 at 15:01
@valdo - not only per-block info but heap manager per-block info. For `deque`, overhead is almost the same as member storage. Why would they do it this way? — Steve Townsend, Nov 04 '10 at 15:04
@Steve: "why would they do it this way?" - the first answer I can think of is that MS/Dinkumware hasn't really looked into optimizing `deque` in this respect. From that comment in the GCC source @AProgrammer quotes, neither has GNU, it's just blessed with a constant that's more suited to large `deques`. Create 10m deques, each containing 10 chars, and suddenly Visual C++ is a genius and GCC is the bad guy with the 5000% overhead. Presumably it would be possible to implement a `deque` that starts with small blocks, and increases the block size later. — Steve Jessop, Nov 04 '10 at 16:35
@Steve Jessop - yes, my thinking so far is just that it's not high on their 'to do' list, rather than what they have reflecting some super-well thought-through design choice. — Steve Townsend, Nov 04 '10 at 16:39

score 5 · Answer 1 · answered Nov 04 '10 at 15:34

5

gcc has

return __size < 512 ? size_t(512 / __size) : size_t(1);

with a comment

/*  The '512' is
 *  tunable (and no other code needs to change), but no investigation has
 *  been done since inheriting the SGI code.
 */

answered Nov 04 '10 at 15:34

AProgrammer

51,233
8
91
143

Could you expand this a bit ? Namely: what is `__size` here ? – Matthieu M. Nov 04 '10 at 16:20
@Matthieu: it's actually the parameter to an inline function, `__deque_buf_size`, but everywhere it's called the argument is `sizeof(_Tp)`, and at a glance it looks as though `_Tp` is always the value type of the deque. – Steve Jessop Nov 04 '10 at 16:28
2

@Steve: thanks, then it means that with GCC the block size is about 512 bytes (at most if `sizeof(_Tp) < 512` and the size of the object otherwise). It's interesting this didn't evolved with the cache becoming larger. Of course there is more potential waste with this system (ie if you only have one `char` in your deque you still have a `512` bytes block...) – Matthieu M. Nov 04 '10 at 18:39

Tabber33 · Answer 2 · 2010-11-04T17:23:57.920

The Dinkumware (MS) implementation wants to grow the deque by 16-bytes at a time. Could it be that this is just an extremely old implementation (like the first one ever?) that was tuned for platforms with very little memory (by today's standards) to prevent overallocating and exhausting memory (like a std::vector will do)?

I had to implement my own queue in an application I'm working on because the 2.5X memory footprint of std::queue (which uses std::deque) was unacceptable.

There seems to be very little evidence on the interwebs that people have run into this inefficiency, which is surprising to me. I would think such a fundamental data structure as a queue (standard library, no less) would be quite ubiquitous in the wild, and would be in performance/time/space-critical applications. But here we are.

To answer the last question, the C++ standard does not define an interface to modify the block size. I'm pretty sure it doesn't mandate any implementation, just complexity requirements for insertions/removals at both ends.

an interesting conjecture. Dinkumware code at http://www.dinkumware.com/deque.txt actually uses this, and refs VC++ 5.0 which is ancient. #define _DEQUESIZ (4096 < sizeof (_Ty) ? 1 : 4096 / sizeof (_Ty)) — Steve Townsend, Nov 04 '10 at 17:03
Interesting... well I guess that blows my theory, since the *current* implementation has made `_DEQUESIZ` smaller! — Tabber33, Nov 04 '10 at 17:11

score 2 · Answer 3 · edited Jun 20 '20 at 09:12

STLPort

... seems to use:

::: <stl/_alloc.h>
...
enum { _MAX_BYTES = 32 * sizeof(void*) };
...
::: <deque>
...
static size_t _S_buffer_size()
{
  const size_t blocksize = _MAX_BYTES;
  return (sizeof(_Tp) < blocksize ? (blocksize / sizeof(_Tp)) : 1);
}

So that would mean 32 x 4 = 128 bytes block size on 32bit and 32 x 8 = 256 bytes block size on 64 bit.

My thought: From a size overhead POV, I guess it would make sense for any implementation to operate with variable length blocks, but I think this would be extremely hard to get right with the constant time random access requirement of deque.

As for the question

Does STL allow for overriding of this block size at compile-time, without modifying the code?

Not possible here either.

Apache

(seems to be the Rogue Wave STL version) apparently uses:

static size_type _C_bufsize () {
    // deque only uses __rw_new_capacity to retrieve the minimum
    // allocation amount; this may be specialized to provide a
    // customized minimum amount
    typedef deque<_TypeT, _Allocator> _RWDeque;
    return _RWSTD_NEW_CAPACITY (_RWDeque, (const _RWDeque*)0, 0);
}

so there seems to be some mechanism to override the block size via specialization and the definition of ... looks like this:

// returns a suggested new capacity for a container needing more space
template <class _Container>
inline _RWSTD_CONTAINER_SIZE_TYPE
__rw_new_capacity (_RWSTD_CONTAINER_SIZE_TYPE __size, const _Container*)
{
    typedef _RWSTD_CONTAINER_SIZE_TYPE _RWSizeT;

    const _RWSizeT __ratio = _RWSizeT (  (_RWSTD_NEW_CAPACITY_RATIO << 10)
                                       / _RWSTD_RATIO_DIVIDER);

    const _RWSizeT __cap =   (__size >> 10) * __ratio
                           + (((__size & 0x3ff) * __ratio) >> 10);

    return (__size += _RWSTD_MINIMUM_NEW_CAPACITY) > __cap ? __size : __cap;
}

So I'd say it's, aehm, complicated.

(If anyone feels like figuring this out further, feel free to edit my answer directly or just leave a comment.)

std::deque memory usage - Visual C++, and comparison to others

3 Answers3

STLPort

Apache

Linked