-3

I created a little performance test comparing the setup and access times of three popular techniques for dynamic allocation: raw pointer, std::unique_ptr, and a std::deque.

EDIT: per @NathanOliver's, added std::vector: EDIT 2: per latedeveloper's, allocated with std::vector(n) and std::deque(n) constructors EDIT 3: per @BaummitAugen, moved allocation inside timing loop, and compiled an optimized version. EDIT 4: per @PaulMcKenzie's comments, set runs to 2000.

Results: These changes have tightened things up a lot. Deque and Vector are still slower on allocation and assignment, while deque is much slower on access:

pickledEgg$ g++ -std=c++11 -o sp2 -O2 sp2.cpp

Average of 2000 runs:
Method  Assign          Access
======  ======          ======
Raw:    0.0000085643    0.0000000724
Smart:  0.0000085281    0.0000000732
Deque:  0.0000205775    0.0000076908
Vector: 0.0000163492    0.0000000760

Just for fun, here are -Ofast results:
pickledEgg$ g++ -std=c++11 -o sp2 -Ofast sp2.cpp

Average of 2000 runs:
Method  Assign          Access
======  ======          ======
Raw:    0.0000045316    0.0000000893
Smart:  0.0000038308    0.0000000730
Deque:  0.0000165620    0.0000076475
Vector: 0.0000063442    0.0000000699

ORIGINAL: For posterity; note lack of optimizer -O2 flag:

pickledEgg$ g++ -std=c++11 -o sp2 sp2.cpp

Average of 100 runs:
Method  Assign      Access
======  ======      ======
Raw:    0.0000466522    0.0000468586
Smart:  0.0004391623    0.0004406758
Deque:  0.0003144142    0.0021758729
Vector: 0.0004715145    0.0003829193

Updated Code:

#include <iostream>
#include <iomanip>
#include <vector>
#include <deque>
#include <chrono>
#include <memory>

const int NUM_RUNS(2000);

int main() {
    std::chrono::high_resolution_clock::time_point b, e;
    std::chrono::duration<double> t, raw_assign(0), raw_access(0), smart_assign(0), smart_access(0), deque_assign(0), deque_access(0), vector_assign(0), vector_access(0);
    int k, tmp, n(32768);

    std::cout << "Average of " << NUM_RUNS << " runs:" << std::endl; 
    std::cout << "Method " << '\t' << "Assign" << "\t\t" << "Access" << std::endl;
    std::cout << "====== " << '\t' << "======" << "\t\t" << "======" << std::endl;

    // Raw
    for (k=0; k<NUM_RUNS; ++k) {
        b = std::chrono::high_resolution_clock::now();
        int* raw_p = new int[n]; // run-time allocation
        for (int i=0; i<n; ++i) { //assign
            raw_p[i] = i;
        }
        e = std::chrono::high_resolution_clock::now();
        t = std::chrono::duration_cast<std::chrono::duration<double> >(e - b);
        raw_assign+=t;
        b = std::chrono::high_resolution_clock::now();
        for (int i=0; i<n; ++i) { //access
            tmp = raw_p[i];
        }
        e = std::chrono::high_resolution_clock::now();
        t = std::chrono::duration_cast<std::chrono::duration<double> >(e - b);
        raw_access+=t;
        delete [] raw_p; // :^)
    }
    raw_assign /= NUM_RUNS;
    raw_access /= NUM_RUNS;
    std::cout << "Raw:   " << '\t' << std::setprecision(10) << std::fixed << raw_assign.count() << '\t' << raw_access.count() << std::endl;

    // Smart
    for (k=0; k<NUM_RUNS; ++k) {
        b = std::chrono::high_resolution_clock::now();
        std::unique_ptr<int []> smart_p(new int[n]); // run-time allocation
        for (int i=0; i<n; ++i) { //assign
            smart_p[i] = i;
        }
        e = std::chrono::high_resolution_clock::now();
        t = std::chrono::duration_cast<std::chrono::duration<double> >(e - b);
        smart_assign+=t;
        b = std::chrono::high_resolution_clock::now();
        for (int i=0; i<n; ++i) { //access
            tmp = smart_p[i];
        }
        e = std::chrono::high_resolution_clock::now();
        t = std::chrono::duration_cast<std::chrono::duration<double> >(e - b);
        smart_access+=t;
    }
    smart_assign /= NUM_RUNS;
    smart_access /= NUM_RUNS;
    std::cout << "Smart: " << '\t' << std::setprecision(10) << std::fixed << smart_assign.count() << '\t' << smart_access.count() << std::endl;

    // Deque
    for (k=0; k<NUM_RUNS; ++k) {
        b = std::chrono::high_resolution_clock::now();
        std::deque<int> myDeque(n);
        for (int i=0; i<n; ++i) { //assign
            myDeque[n] = i;
//          myDeque.push_back(i);
        }
        e = std::chrono::high_resolution_clock::now();
        t = std::chrono::duration_cast<std::chrono::duration<double> >(e - b);
        deque_assign+=t;
        b = std::chrono::high_resolution_clock::now();
        for (int i=0; i<n; ++i) { //access
            tmp = myDeque[n];
        }
        e = std::chrono::high_resolution_clock::now();
        t = std::chrono::duration_cast<std::chrono::duration<double> >(e - b);
        deque_access+=t;
    }
    deque_assign /= NUM_RUNS;
    deque_access /= NUM_RUNS;
    std::cout << "Deque: " << '\t' << std::setprecision(10) << std::fixed << deque_assign.count() << '\t' << deque_access.count() << std::endl;

    // vector
    for (k=0; k<NUM_RUNS; ++k) {
        b = std::chrono::high_resolution_clock::now();
        std::vector<int> myVector(n);
        for (int i=0; i<n; ++i) { //assign
            myVector[i] = i;
//          .push_back(i);
        }
        e = std::chrono::high_resolution_clock::now();
        t = std::chrono::duration_cast<std::chrono::duration<double> >(e - b);
        vector_assign+=t;
        b = std::chrono::high_resolution_clock::now();
        for (int i=0; i<n; ++i) { //access
            tmp = myVector[i];
//          tmp = *(myVector.begin() + i);
        }
        e = std::chrono::high_resolution_clock::now();
        t = std::chrono::duration_cast<std::chrono::duration<double> >(e - b);
        vector_access+=t;
    }
    vector_assign /= NUM_RUNS;
    vector_access /= NUM_RUNS;
    std::cout << "Vector:" << '\t' << std::setprecision(10) << std::fixed << vector_assign.count() << '\t' << vector_access.count() << std::endl;

    std::cout << std::endl;
    return 0;
}
kmiklas
  • 13,085
  • 22
  • 67
  • 103
  • 3
    A `deque` is not like an array. You should compare the performance to a `std::vector`. – NathanOliver Jan 20 '17 at 19:02
  • 1
    Also, please share how you invoke your compiler. A lot of people measure unoptimized/debug builds, which is utterly useless. – Baum mit Augen Jan 20 '17 at 19:06
  • 2
    Use the `std::deque(n)` or `std::vector(n)` constructor. –  Jan 20 '17 at 19:07
  • 2
    To get times anywhere near these, I have to run in debug mode. With optimizations, there is no difference between raw pointer and `unique_ptr`. The `deque` is a lot slower because you include the memory allocations in the timing. – Bo Persson Jan 20 '17 at 19:09
  • 1
    ^Right now, you are measuring the allocation time of the container and not the allocation time of the raw arrays. You must *at least* reserve enough memory beforehand for a somewhat fair comparison. – Baum mit Augen Jan 20 '17 at 19:09
  • @BaummitAugen, ``g++ -std=c++11 -o sp2 sp2.cpp`` – kmiklas Jan 20 '17 at 19:10
  • 3
    Use `-O2` or `-O3` and see what happens – NathanOliver Jan 20 '17 at 19:10
  • @BaummitAugen, allocation (and deallocatin) of the raw array occurs within the loop. – kmiklas Jan 20 '17 at 19:13
  • 2
    @kmiklas Unless you're timing an optimized build, the results are meaningless. With over a 2K rep, this should have been a given. – PaulMcKenzie Jan 20 '17 at 19:13
  • 2
    @kmiklas But not within the segment you time. – Baum mit Augen Jan 20 '17 at 19:14
  • @BaummitAugen oops! – kmiklas Jan 20 '17 at 19:15
  • 1
    [Results when compiled with optimizations turned on](http://coliru.stacked-crooked.com/a/2550b6d03cd2ee1f) – PaulMcKenzie Jan 20 '17 at 19:21
  • Updated per the above comments. – kmiklas Jan 20 '17 at 19:39
  • You are still comparing apples to oranges. Timing `std::vector myVector(n);` is not the same as `int* raw_p = new int[n];` since the former requires all the elements in the vector to be set to a default value. Moving the allocations out of the timings (timing allocation normally is not what you want to do) [you get the same results](http://coliru.stacked-crooked.com/a/f12b0fc164e5e556) – NathanOliver Jan 20 '17 at 19:55

2 Answers2

1

As you can see from the results, raw pointers are the clear winner in both categories. Why is this?

Because ...

g++ -std=c++11 -o sp2 sp2.cpp

... you didn't enable optimization. Calling an operator overloaded for a non-fundamental type such as std::vector or std::unique_ptr involves a function call. Using operators of fundamental types like a raw pointer do not involve function calls.

A function call is typically slower than no function call. Over several iterations, the small overhead of the function call multiplies. However, an optimizer can expand function calls inline thereby making the disadvantage of non-fundamental types void. But only if the optimization is performed.


std::deque has an additional reason for being slower: The algorithm to access an arbitrary element of a double ended queue is more complicated than accessing an array. While std::deque has decent random access performance, it is not as good array has. A more appropriate use case for std::deque is linear iteration (using an iterator).

Furthermore, you used std::deque::at, which does bounds checking. The subscript operator does not do bounds checking. Bounds checking adds runtime overhead.


The slight edge that the raw array appears to have with the allocation speed over the std::vector, may be because std::vector zero-initializes the data.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • "You'd get better performance by using an iterator." <-- How do you suggest I use an iterator for random access? That's part of what I'm timing. – kmiklas Jan 20 '17 at 19:50
  • @kmiklas You could use iterator for random access, but that's not what I mean. I'm saying that `std::deque` has slower random access than an array. Linear iteration is a more appropriate use case for `std::deque`. I'll clarify. – eerorika Jan 20 '17 at 19:52
-2

A std::deque is a doubly linked list. myDeque.at(i) has to walk through the first i elements on every call. That is why the access to the deque is so slow.

The initialiation of std::vector is slow, because you don't preallocate enough memory. std::vector then starts with a small number of elements and usually doubles that as soon as you try to insert more. This reallocation involves calling the move constructor for all elements. Try to construct the vector like this:

std::vector<int> myVector{n};

in the vector access I wonder why you didn't use tmp = myVector[i]. Instead of calling the index operator, you instantiate an iterator, call its + operator and on the result you call the dereference operator. Since you are not optimizing, function calls will probably not be inlined, so that is, why std::vector access is slower than the raw pointer.

For the std::uniqe_ptr I suppose, that it has similar reasons as with std::vector. You always call the index operator on the unique pointer, which is a function call as well. Just as an experiment, can you please try and immediately after allocating the memory for smart_p, call smart_p.get() and use the raw pointer for the rest of the operations. I assume, that it will be just as fast as the raw pointer. That could prove my assumption, that it is the function calls. Then the simple advice is, enable optimizations and try again.

kmiklas edit:

Average of 2000 runs:
Method  Assign          Access
======  ======          ======
Raw:    0.0000086415    0.0000000681
Smart:  0.0000081824    0.0000000670
Deque:  0.0000204542    0.0000076554
Vector: 0.0000164252    0.0000000678
kmiklas
  • 13,085
  • 22
  • 67
  • 103
cdonat
  • 2,748
  • 16
  • 24
  • "can you please try and immediately after allocating the memory for smart_p, call smart_p.get()" <---- Done, edited your post with results. With this change, unique_ptr was faster! – kmiklas Jan 20 '17 at 19:44
  • 2
    `std::deque` is not a plain linked list. `std::deque::at` does not have to walk through the first `i` elements. That would imply that the complexity of random access was linear. But the standard requires it to have constant complexity. – eerorika Jan 20 '17 at 19:44
  • 2
    Saying *A std::deque is a doubly linked list* is not quite right. It is a list of chuncks of elements. Otherwise it would be the same as `std::list`. – NathanOliver Jan 20 '17 at 19:44
  • std::vector myVector{n}; produces a segmentation fault. Is std::vector myVector(n) acceptable? – kmiklas Jan 20 '17 at 19:48
  • Ah, I see. `vector` is a special case. Yes, that is acceptable, of course. As your results show, the function calls make the difference. So when you switch on optimizations, the compiler can inline them all. You'll probably see no big difference between raw, and `uniqe_ptr`. I think, what you see now is mostly statistical noise. `vector will probably be slightly slower, but not much. I wouldn't expect you to be able to distinguish the difference from statistical noise. – cdonat Jan 20 '17 at 19:58
  • @NathanOliver Yes, you are correct. I'm sorry. Anyway deque needs a more complex data structure than std::vector to fulfill its guarantees. The complexty is O(1), but the constant factor is bigger. That is I think, the reason for the much slower measured access. Thanks for correcting me. – cdonat Jan 20 '17 at 20:05
  • @user2079303 Of course I include you in my thanks :-) – cdonat Jan 20 '17 at 20:06