10

I wrote a simple C++ code to check the speed of sorting data , represented in the form of a list and then a vector.

In the case of the list I am getting time as 27 seconds. For a vector I get 10 seconds. Why the huge performance gap? Aren't the algorithms used for sorting the list and the vector the same? viz. mergesort?

EDIT: I may be wrong on the last point. As I know, textbooks when descirbing sorting algorithms theoretically, seem to be use the word list in the sense of a std::vector. I don't know how how sorting algorithms for vectors would be different from sorting algorithms for lists, so if some one could clarify that would be really helpful. Thank you.

 //In this code we compare the sorting times for lists and vectors.
//Both contain a sequence of structs

#include <iostream>
#include <vector>
#include <list>
#include <algorithm>
#include <time.h>
#include <math.h>
#include <stdlib.h>
#include <iomanip>
using namespace std;


struct particle
{
  double x;
  double y;
  double z;
  double w;

    bool operator<(const particle& a) const
    {
        return x < a.x;
    }

};


int main(int argc, char *argv[])
{
  int N=20000000;
  clock_t start,stop;

  vector<particle> myvec(N);
  vector<particle>::iterator cii;
  //Set vector values
  for (cii = myvec.begin(); cii != myvec.end(); ++cii)
  {
    cii->x =1.0*rand()/RAND_MAX;
    cii->y =1.0*rand()/RAND_MAX;
    cii->z =1.0*rand()/RAND_MAX;
    cii->w =1.0*rand()/RAND_MAX;
 }


  list<particle> mylist(N);
  list<particle>::iterator dii;

   //Set list values
  for (cii=myvec.begin(),dii = mylist.begin(); dii != mylist.end() && cii!=myvec.end(); ++dii, ++cii)
  {
      dii->x =cii->x;
      dii->y =cii->y;
          dii->z =cii->z;
      dii->w =cii->w;
 }


  //Sort the vector 

  start=clock();
  sort(myvec.begin(),myvec.end());
  stop=clock();
  cout<<"Time for sorting vector "<<(stop-start)/(double) CLOCKS_PER_SEC<<endl;



  //Sort the list
  start=clock();
  mylist.sort();
  stop=clock();
  cout<<"Time for sorting list "<<(stop-start)/(double) CLOCKS_PER_SEC<<endl;



  return 0;
}
smilingbuddha
  • 14,334
  • 33
  • 112
  • 189
  • This is one of the reasons why we don't use `list` for everything. You have to pick a data structure that fits into your usage such that you maximise its strengths and minimise its weaknesses. Sorting a list (and the more general problem, random access) is one of its weaknesses. – Seth Carnegie Dec 12 '11 at 21:50
  • 3
    You should try an example where copying your structure is expensive! –  Dec 12 '11 at 22:01

5 Answers5

8

No a std::vector is not sorted using merge sort (in most implementations; the standard doesn't specify the algorithm).

std::list does not have O(1) random access, so it cannot use algorithms like Quick sort* which requires O(1) random access to be fast (this is also why std::sort doesn't work on std::list.)

With this, you'll have to use algorithms that forward iteration is enough, such as the Merge sort**.

And merge sort is typically slower [1][2].

See also: what's the difference between list.sort and std::sort?

*: libstdc++ actually uses introsort.
**: libstdc++ actually uses a variant of merge sort

Community
  • 1
  • 1
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • Just a small quibble: You can implement quick sort using std::list. During the partitioning phase, choose the first element of the range as your pivot. Create two new lists to represent the two halves and use list::splice to move elements from the original list into the appropriate partitions. I have no idea if any STL implementations actually do this. The big disadvantage is that you don't have any defence against the sorted list worst case. In any event, the point is that this works using forward only iterators and is still O(N lg N). – Peter Ruderman Sep 24 '13 at 20:50
  • @PeterRuderman: "no idea if any STL implementations actually do this" - I can't imagine it... "Create two new lists" is significantly slower and more wasteful of memory than creating a contiguous index for the original list (a vector of pointers/iterators), which once done would allow random access and faster sorting. – Tony Delroy Mar 31 '14 at 01:41
  • @Tony D That's not quite true, Tony. You can use list::splice to move nodes between lists just by updating the internal pointers. No additional memory is required. – Peter Ruderman Mar 31 '14 at 17:57
  • @PeterRuderman: oh sorry - when you said "new lists" my mind locked onto copying, but you did mention `splice`. Fair enough then. Cheers. – Tony Delroy Apr 01 '14 at 01:21
5

A vector packs things closer in memory than a list does. This results in a more cache-friendly access pattern during sorting.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • 1
    It's not just memory efficiency. – Matt Ball Dec 12 '11 at 21:49
  • 1
    I agree. But that's a *huge* factor for small objects. – David Schwartz Dec 12 '11 at 21:50
  • 1
    I wrote a custom allocator once that allocated things in memory for `list` side by side like `vector` would do, to test the performance. It was still dog-slow for sorting and iterating compared to `vector`. – Seth Carnegie Dec 12 '11 at 21:52
  • 2
    @SethCarnegie You still didn't pack them as closely as a vector does. A vector will pack an n-byte object every n-bytes. A list just won't pack them that tightly regardless of the allocator used. Also, the lack of accessing pointers makes the access pattern more cache friendly as well and is a direct consequence of how closely packed they are. – David Schwartz Dec 12 '11 at 22:05
  • @Seth Carnegie & @David Schwartz Does writing a custom allocator change the way the list accesses or iterates? If not then you can pack data as tightly as you want it's not going to change the number of pointer lookups. The compiler has to be told in some way that the data is contiguous in order to take advantage of it. A vector probably wraps something like `data = new T[43]`. There are 43 valid pointers to be had but the compiler won't know to 'load' the whole thing unless you have something like `data[6]` or `data + 6`. – Trygve Skogsholm Jul 22 '16 at 21:36
4

I'm really not a C++ programmer, but my understanding is that std::vector has different performance characteristics from std::list. Specifically (as @Martinho commented):

std::vector has O(1) random access, while std::list doesn't.


From cplusplus.com (I'm sure there are less sketchy references out there, feel free to chime in):

Vectors are good at:

  • Accessing individual elements by their position index (constant time).
  • Iterating over the elements in any order (linear time).
  • Add and remove elements from its end (constant amortized time).

and:

...Advantages to list containers:

  • Efficient insertion and removal of elements anywhere in the container (constant time).
  • Efficient moving elements and block of elements within the container or even between different containers (constant time).
  • Iterating over the elements in forward or reverse order (linear time).
Community
  • 1
  • 1
Matt Ball
  • 354,903
  • 100
  • 647
  • 710
  • 3
    You're close: `std::vector` has O(1) random access, while `std::list` doesn't (it's a doubly-linked list). – R. Martinho Fernandes Dec 12 '11 at 21:47
  • It might be worth noting that while both `vector` and `list` have constant time for iterating and inserting items at the back, `vector` has a smaller constant. – Seth Carnegie Dec 12 '11 at 21:58
  • This doesn't quite address sorting. Sorting a list doesn't use random access and there is a theoretical benefit that it doesn't need to swap values. - Having random access can allow for better sorting algorithms, though? – UncleBens Dec 12 '11 at 22:07
  • 2
    @UncleBens: Sorting goes a lot faster with random access. Plus, the cache behaviour of a linked list is absurdly poor. – Puppy Dec 12 '11 at 22:18
  • @DeadMG: Not entirely convincing. It also depends on the kind of thing you are sorting: http://codepad.org/xj6g6z6w. And list::sort is still O(n log n), even though it might be a bit worse for types that can be inexpensively swapped. – UncleBens Dec 12 '11 at 23:59
4

list::sort and std::sort on vectors don't use the same algorithm.

std::sort uses a sorting algorithm that requires random-access iterators, such as the ones required by std::vector, but not by std::list.

list::sort is specialized for lists; it usually implements a merge sort, which does not require random access.

The total number of comparisons is O(n log n) for both algorithms (I say that without knowing the exact algorithm used by my compiler's std::sort implementation). The total number of swaps is O(n log n) as well, but for std::sort, that means O(n log n) calls to copy constructor/assignment operator, whereas for list::sort, it's a pointer operation. Your structure is way too small for this advantage to pay off. I assume that as soon as you put something with a non-trivial copy constructor into the struct (maybe a std::string is enough), std::list will win.

EDIT: One std::string member initialised with a random double converted to text seems to be about the break-even point on my machine (x86_64-linux, gcc 4.6.2)

wolfgang
  • 4,883
  • 22
  • 27
  • Do you mean you get a better performance with lists with a string member in the struct for N=20000000? – smilingbuddha Dec 13 '11 at 01:45
  • @milingbuddha Actually, I get roughly equal performance for lists and vectors for N=2000000. Note the missing 0, I ran out of memory. Asymptotic performance being equal, that shouldn't matter too much. – wolfgang Dec 13 '11 at 07:52
1

Vectors will allow constant time element swapping as well as constant time random access. Lists take linear time to random access while having (probably) a touch more overhead for the swap with pointer updates. My guess is the sort is doing a bunch of swaps. Also, vectors are more efficient at moving large parts of themselves around in memory.

I'd be curious if swapping an slist<> would go faster than a list due to the slightly less pointer overhead.

Michael Dorgan
  • 12,453
  • 3
  • 31
  • 61