Default size of std::vector / Programming books myth?

Question

in a german programming book(dating to 2012) called "C++ für C-Programmierer"(C++ for C programmers, duh!) which I bought as a reference I found the following section in the chapter about the STL(I'll translate right away for you guys):

Most STL implementations are generous in terms of memory management. The allocation of vectors is mostly done in 1kb chunks for instances. The clipppings do not really matter if one allocates a few vectors, but it does if you create ten- to hundredthousands of them.

I could not find any source confirming that. I know it depends on the implementation, but I could not find anything that confirms that for even one platform. Cplusplus.com merely states:

[...]

Therefore, compared to arrays, vectors consume more memory in exchange for the ability to manage storage and grow dynamically in an efficient way.

What have I tried so far?

I wrote a little C++ program exploiting the OS X specific malloc_size() function, but I have never used it and I am pretty sure I am doing it wrong. If I do something along the lines of:

std::vector<int>* i = new std::vector<int>;
std::cout << malloc_size(i) << std::endl;

Cout merely tells me 32, which may well be the size of an int and therefore prove the author partially wrong, but I am not really convinced by my own efforts.

Does anyone know better or know a resource? Thanks in advance.

Regards, Carson

As you said, it depends on the implementation, but a traditional approach is to allocate twice the memory each time the vector grows. So, if it allocates space for 128 elements, and later more space is needed, it grows to 256, and so forth. *Accelerated C++*, which I highly recommend, provides an implementation of a vector class template using this technique. — Filipe Gonçalves, May 29 '14 at 16:52
Is `malloc_size` clever enough to go from a pointer to a single vector (why are you doing that, by the way?) to all the memory allocated internally by the vector the pointer points to? — juanchopanza, May 29 '14 at 16:54
Whatever you do, the implementation has to guarantee that the amortized cost of pushing back elements should be constant. — Kerrek SB, May 29 '14 at 16:56
@juanchopanza , I did that because `malloc_size()`takes a pointer. I did it without new and `6i`before, but that told me it was `0`which did not really seem right. But I might be mistaken. Could also be the size of the pointer I guess. @Arkadiy, I did that, but that also returned `0`, which did not seem right. @Kerrek-SB, i lol'd. — hellerve, May 29 '14 at 17:00
@Carson You could instantiate the vector in automatic storage, and pass `v.data()` (C++11) or `&v[0]` (C++03) to `malloc_size`. But I wonder if they are referring to some kind of memory pool in the book. — juanchopanza, May 29 '14 at 17:02
@juanchopanza, if I do that, it's 0 again. Does that mean it does not automatically allocate anything at all? When I do the following: http://pastebin.com/gQSqEpuh , cout gives me 0, then 16. That should disprove the claim for my implementation at least. — hellerve, May 29 '14 at 17:08
What if you initialize it to a size larger than 0? e.g. `std::vector v(128);`. — juanchopanza, May 29 '14 at 17:10
@ Don Reba, that would tell me how it is for my implementation of the STL. But what I am aiming for is a broader perspective. @juanchopanza, it seems like my implementation does a double-size-when-needed approach which i proved with the following code: http://pastebin.com/1AWFy46z — hellerve, May 29 '14 at 17:20
When comparing vectors to C-style arrays you should also consider `std::array` — M.M, May 30 '14 at 05:58

score 18 · Answer 1 · edited May 30 '14 at 05:35

Your code doesn't measure what you want it to measure. The vector structure itself is usually quite small. It basically contains a few fields necessary for tracking the allocated memory and a pointer to that memory. What you want to measure is different.

------        -------------------
| i  |------> | A few fields    |
------        | (e.g., size and |
              |  capacity)      |         -------------------
              |-----------------|         | Space allocated |
              |   pointer       |-------> | for elements    |
              -------------------         -------------------
                 ^                            ^
               What your code               What you want to
               measures                     measure

You can probably supply a custom allocator to vector that tracks and reports the size of requested allocations. The GCC 4.8.1 implementation on my computer allocates no memory for a default constructed vector (since it has no elements), and uses the double-size-on-every-growth implementation noted in the comments.

Thanks for the insight. I was able to confirm that my implementation works similar with that code: http://pastebin.com/1AWFy46z ; do you know of a platform where the author is right? — hellerve, May 29 '14 at 17:15
@Carson: I know of no stdlib where the author is right, but that doesn't mean it's never happened. Could be common in Germany for all I know. — Mooing Duck, May 29 '14 at 18:23

score 11 · Answer 2 · answered May 29 '14 at 17:09

The vector object itself consists of only a few pointers, so the 32-byte size you showed is not surprising, and it won't change over time.

I believe the text of the book is referring to the storage allocated for the contents of the vector. As you add items to the vector, it will allocate space to store them, but that space won't be reflected in malloc_size.

You can figure out how much space the vector has allocated by calling the vector's capacity() method. This will tell you how many items it can hold. If you want the size in bytes, you can multiple the capacity by the size of the element type.

The quoted text talks about 1 KB blocks. Older dynamic containers used linear schemes when they needed to grow. But the runtime complexity requirements that standard places on std::vector doesn't allow for that approach. Instead, a vector must grow by some percentage of its current size.

Many implementations use 100%. So if a vector currently has room for 10 items, and it needs to grow, it'll resize up to 20 items. If it needs to grow even farther, it'll resize up to 40 items, and so on. Thus, in the worst case, you can end up with a vector that has allocated almost twice as much space as would actually be needed. Some implementations use 50%, which still meets runtime complexity requirements without growing quite as fast or "wasting" as much space. (There is at least one other advantage to using a factor less than 100%, but it's not relevant to this discussion.)

On a modern computer with virtual memory, either method is usually fine--the performance will be more important than the unused memory. If you're on an embedded system with limited resources, you might want to exercise more direct control. There are tricks like copy-and-swap that can prune a vector with excess capacity down to a size that's close to the actual need.

Thanks for the insight. I was able to confirm that my implementation works similar to the double-approach with that code: pastebin.com/1AWFy46z ; do you know of a platform where the author is right? — hellerve, May 29 '14 at 17:17
There's not enough context (or something is lost in translation) to know for sure if the author is right or wrong. In a sense, even with geometric growth, it still might be that the implementation chooses multiples of a block size like 1 KB. No standards-compliant version of std::vector uses linear growth, though linear growth was pretty common back when people were implementing their own dynamic arrays. — Adrian McCarthy, May 29 '14 at 17:29
I tried to keep the translation as close as possible to the source, but that might be the point, Maybe its just the book, though, I am not too impressed with it. — hellerve, May 29 '14 at 17:31

Potatoswatter · Accepted Answer · 2014-05-30T09:28:08.297

The book is incorrect in several ways. std::vector grows according to a geometric series, so a certain percentage will always be filled (unless you erase elements). The clippings may add up, but in general it will be a fraction proportional to the memory actually used. 50-65% is a typical worst case lower bound for the fraction that is actually in use.

This is not implementation dependent. The geometric series is required to ensure that push_back takes O(1) amortized time. Linear growth would result in O(N) amortized time (or O(N^2) to push_back a sequence of N values).

The implementation may decide a non-negligible minimum size for a non-empty vector, but there's no good reason to do so because small ones are common. Likewise, default-initialized vectors are invariably implemented to reserve no dynamic memory at all.

You don't need malloc_size to learn how much memory is being reserved. Just use v.capacity() * sizeof( elem_type ).

Thank you very much! That clears it up for me. Even if you're late to the party, that's my favourite answer up until now. — hellerve, May 30 '14 at 08:00
@Carson I wouldn't have answered unless I thought I could do better. Feel free to click the checkmark. — Potatoswatter, May 30 '14 at 09:25

Default size of std::vector / Programming books myth?

What have I tried so far?

3 Answers3