5

I have an existing algorithm and I need to optimize it sligtly if it is possible. Changing a lot in this algorithm is not an option at the moment. The algoritm works with instance of std::vector< std::vector<unsigned char> >. It looks like this:

typedef std::vector<unsigned char> internal_vector_t;
std::vector< internal_vector_t > internal_vectors; 

while (fetching lots of records) {
   internal_vector_t tmp;
   // reads 1Mb of chars in tmp...
   internal_vectors.push_back(tmp);
   // some more work
}

// use this internal_vectors

The algorithm inserts a lot of times in internal_vectors instances of internal_vector_t using push_back(). Most of instances of internal_vector_t have size 1 Mb. Since the size of the internal_vectors is unknown no reserve() is done beforehand.

The first thing that I don't understand is what is happening when internal_vectors reachs its current capacity, needs to allocate a new block and copy its current content in the bigger block of memory. Since most of the blocks are 1Mb in size copying is a long operation. Should I expect that a compiler (gcc 4.3, MS VC++ 2008) will manage to optimize it in order to avoid copying?

If copying is unavoidable will changing to std::deque help? I consider std::deque because I still need accessing by index like internal_vectors[10]. Like this:

typedef std::vector<unsigned char> internal_vector_t;
std::deque< internal_vector_t > internal_vectors; 
// the same while

As far as I understand std::deque does not need relocate that was once allocated. Am I right that std::deque in this situation will requere less allocation and copying on push_backs?


Update:
1) According to DeadMG MSVC9 does this type of optimization (The Swaptimization - TR1 Fixes In VC9 SP1). gcc 4.3 probably doesn't do this type of optimization.

2) I have profiled the version of the algorithm that use std::deque< std::vector<unsigned char> > and I see that its performace is better.

3) I have also made use of using swap that was suggested by Mark Ransom. Using this improved the performance:

   internal_vector_t tmp;
   internal_vectors.push_back(empty);
   tmp.swap(internal_vectors.back());
Community
  • 1
  • 1
  • are you using `insert` or `push_back`? The code says `insert` the text `push_back`, and the cost of both is quite different for a vector. – David Rodríguez - dribeas Feb 15 '12 at 19:26
  • 1
    When it runs out of capacity, it has to allocate more RAM, obviously. It does that based on an increment value. Both increment, and initial capacity should be setable afiak. The higher the increment value, the more memory it will allocate each time it runs out. – crush Feb 15 '12 at 19:26
  • just `reserve()` a big chunk (2048?), that should solve the issue... – Karoly Horvath Feb 15 '12 at 19:26
  • 1
    I use `push_back`, fixed it in my question –  Feb 15 '12 at 19:26
  • @skwllsp: Then your code sample is not related to your question, and we cannot reliably help further. – Lightness Races in Orbit Feb 15 '12 at 19:28
  • @crush: And how do you set this "increment value", pray tell? In fact you get a geometric increase in the vector's capacity, [usually 1.5x or 2x](http://stackoverflow.com/questions/5404489/standard-container-re-allocation-multipliers-across-popular-toolchains). – Lightness Races in Orbit Feb 15 '12 at 19:29
  • @ Lightness Races in Orbit, I fixed it. Why not? –  Feb 15 '12 at 19:30
  • @skwllsp: If we can't trust the information you give us, then we cannot give an answer that _you_ can trust, and we're all just wasting our time. – Lightness Races in Orbit Feb 15 '12 at 19:32
  • @Lightness Races in Orbit. Sorry. But are there still any contradictions in my question ? –  Feb 15 '12 at 19:33
  • Seems it is 2x in the std. We use a custom implementation that allows you to set the increment in cases where you have a linear growth such as this. – crush Feb 15 '12 at 19:34
  • @crush: while 2x is common, I think any fixed ratio is _technically_ valid. – Mooing Duck Feb 15 '12 at 20:16
  • @skwllsp: Perhaps not, but the fact that one crept in at all indicates that this is not your actual copy/pasted testcase. We just can't trust it! – Lightness Races in Orbit Feb 15 '12 at 21:58
  • @crush: All implementations are "custom" – Lightness Races in Orbit Feb 15 '12 at 21:58
  • If you upgrade to using C++11, rvalue references remove the majority of this copying you're seeing, and you'll most likely see the performance improvement you're looking for. – Clark Gaebel Feb 16 '12 at 04:36

5 Answers5

3

MSVC9 implements something known as "swaptimization" for it's Standard containers. It's a weaker version of move semantics. When the external vector is resized, it will not copy the internal vectors.

However, you'd do best simply upgrading your compiler to MSVC10 or GCC (4.5, I think it is) which will give you move semantics, which makes such operations vastly more efficient. Of course, a std::deque is probably still the smarter container, but move semantics are performance-beneficial in many, many places.

Puppy
  • 144,682
  • 38
  • 256
  • 465
  • Is there anything similar to swaptimization in gcc? You mentioned gcc 4.5. –  Feb 16 '12 at 06:25
  • 1
    @skwllsp: The proper version is Move Semantics, it's a C++11 feature. You can find it in MSVC10 and in some recent version of GCC, 4.4 or 4.5 – Puppy Feb 16 '12 at 07:37
2

Each time you insert a internal_vector_t into internal_vectors, it is going to make a copy of the internal_vector_t. This will be true whether you use vector or deque. The standard containers always make a copy of the object you're inserting.

You can eliminate the copying by inserting an empty internal_vector_t and then swap the contents of the inserted object with the one you really wanted to insert.

Occasionally the vector will need to resize itself as it runs out of room during an insertion, which would result in objects being copied again. A deque will eliminate this as long as you're always inserting at the beginning or end.

Edit: The advice I gave above can be summarized with these code changes. This code should avoid all copying of the large vectors.

typedef std::vector<unsigned char> internal_vector_t;
std::deque< internal_vector_t > internal_vectors; 
internal_vector_t empty;

while (fetching lots of records) {
   internal_vector_t tmp;
   // reads 1Mb of chars in tmp...
   internal_vectors.push_back(empty);
   tmp.swap(internal_vectors.back());
   // some more work
}
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • Actually I am mainly interested in optimizing this: `Occasionally the vector will need to resize itself as it runs out of room during an insertion` since this is the second most often called function when I profile code. –  Feb 15 '12 at 19:37
  • @skwllsp, `vector` is usually designed to allocate ever greater amounts so that the frequency of copying decreases as the number of items to copy increases. Are you able to tell if the calls are coming from `internal_vector_t` or `internal_vectors`? – Mark Ransom Feb 15 '12 at 19:45
  • Yes, I am able. The calls are coming either from `std::vector< std::vector >::M_fill_insert` or `std::vector::M_fill_insert`. `std::vector< std::vector >::M_fill_insert` takes more time to process. That is why I ask about possible optimization and about using std::deque –  Feb 15 '12 at 19:49
  • @skwllsp, since you're using `push_back` you won't incur any copying overhead with std::deque, so it would appear to be a good choice in your situation. Why haven't you just tried it already to see if it made an improvement? – Mark Ransom Feb 15 '12 at 19:56
  • @skwllsp: I would be shocked if the slow part of `internal_vectors.push_back(tmp);` was the resizing of `internal_vectors`. I can pretty much guarantee it's from copying `tmp`, and `deque` will have the same problem. Use the swap trick Mark suggested, I bet that makes it twice as fast. – Mooing Duck Feb 15 '12 at 20:06
  • This isn't true. MSVC9 implements "swaptimization" for Standard containers, so when the external vector resizes, it won't copy the internal ones. – Puppy Feb 16 '12 at 03:56
  • @DeadMG Have you got any links to read about this "swaptimization"? And what about gcc? –  Feb 16 '12 at 04:05
  • 1
    @DeadMG, interesting. I had considered that such an optimization would be possible, but how does the compiler know that the types are swappable? Wouldn't that result in pessimization if the type didn't specialize `std::swap`? – Mark Ransom Feb 16 '12 at 04:32
  • @Mark Ransom, thank you for your idea about `swap`, I've made use of it –  Feb 20 '12 at 07:47
1

std::deque does not store it's elements contiguously - it breaks it's storage up into a series of constant sized "blocks". This means that when a std::deque runs out of capacity it only needs to allocate a new block of constant size - it does not need to reallocate it's whole internal buffer and move all of it's existing elements.

std::vector on the other hand does maintain contiguous storage, so when it runs out of capacity and reallocates, it does need to move all of it's existing elements - this can be expensive.

std::vector is "smart" about its reallocation scheme, allocating in chunks according to a geometric series (often doubling or increasing the capacity by 1.5 etc). This means that reallocation doesn't occur often.

std::deque may be more efficient in this case since when reallocation does occur it does less work. As always, you'd have to benchmark to get any real numbers.

Your code could probably be improved further in other areas. It seems that at each iteration of the while loop you're creating a new internal_vector_t tmp. It may be more efficient to declare this outside the loop and just ::clear() it's storage at each iteration. You're also copying the whole tmp vector each time you call internal_vectors.push_back(tmp) - you could probably improve on this by just moving the tmp vector via internal_vectors.push_back(std::move(tmp)) - this will just copy a few pointers.

Hope this helps.

Darren Engwirda
  • 6,915
  • 4
  • 26
  • 42
  • I basically only use `std::deque` for FIFO queues, or if I need a growing container that commonly takes more than half my RAM (very rare). – Mooing Duck Feb 15 '12 at 20:15
  • @MooingDuck: I think there are other use-cases than that. If you have no idea how much space to `::reserve` and the size might end up large (essentially this question) I would look at `std::deque`. It's not only the reallocation cost of `std::vector` that can be an issue, but also the potential for memory fragmentation on repeated realloc's. Typically I've found that `std::deque` can be more efficient at sizes << half RAM, but of course you have to benchmark the particular code you're working with. – Darren Engwirda Feb 15 '12 at 20:36
  • 1
    The trick is `deque` tends to make _far more_ allocations than a vector, though it doesn't have the copy, which makes comparison hard. For pushing back 5000 `int`, MSFT's `vector` will do ~19 allocations, `deque` ~1250 allocations. For gcc, that's ~12 and ~39 respectively. But a deque doesn't copy. @skwllsp: profile! – Mooing Duck Feb 15 '12 at 21:27
  • @MooingDuck: It's well known that the (current) MSVC `std::deque` is deeply flawed, as they allocate in crazy small blocks of `16 bytes`, leading to the behaviour that you mention. I don't use the MSVC container for this reason, and I think it's unfair to make generalisations based on one particular std library's implementation issues. – Darren Engwirda Feb 16 '12 at 01:58
  • `deque` is useful, but usually a vector is the right answer. As for generalizations based of one implementation, that's why I listed the _two most common_ implementations. – Mooing Duck Feb 16 '12 at 02:50
  • @Darren Engwirda: Do note that [Dinkumware](http://dinkumware.com/) writes the standard library that MSVC++ uses, so any compiler that uses Dinkumware's standard library implementation would also probably behave the same way. – In silico Feb 16 '12 at 04:01
0

Are you indexing the outer vector? If not, how about std::list<std::vector<unsigned char> >?

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • Yes, I mentioned it in my question. I consider std::deque because I still need accessing by index like internal_vectors[10]. –  Feb 15 '12 at 19:32
  • @skwllsp Do you actually need random access? You might get away with traversing through the list. Just increment a counter and check if one of the elements you need is at that index. I hope I'm making sense – pezcode Feb 15 '12 at 20:00
  • @pezcode `Do you actually need random access?` Not sure. But significant change in this algorithm is not an option, sadly. –  Feb 15 '12 at 20:12
0

A dequeue may be more efficient depending on the implementation. Unlike a vector, a dequeue will not guarantee continuous storage and can thus allocate several separate blocks of memory. Therefore it can allocate more memory without moving elements already added. You should try it and measure the impact.

rasmus
  • 3,136
  • 17
  • 22