What makes this bucket sort function slow?

Question

The function is defined as

void bucketsort(Array& A){
  size_t numBuckets=A.size();
  iarray<List> buckets(numBuckets);

  //put in buckets
  for(size_t i=0;i!=A.size();i++){
    buckets[int(numBuckets*A[i])].push_back(A[i]);
  }

  ////get back from buckets
  //for(size_t i=0,head=0;i!=numBuckets;i++){
  //size_t bucket_size=buckets[i].size();
  //for(size_t j=0;j!=bucket_size;j++){
  //  A[head+j] = buckets[i].front();
  //  buckets[i].pop_front();
  //}
  //head += bucket_size;
  //}
 for(size_t i=0,head=0;i!=numBuckets;i++){
   while(!buckets[i].empty()){
     A[head]          = buckets[i].back();
     buckets[i].pop_back();
     head++;
   }
 }

  //inseration sort
  insertionsort(A);
}

where List is just list<double> in STL.

The content of array are generate randomly in [0,1). Theoretically bucket sort should be faster than quicksort for large size for it's O(n),but it fails as in the following graph.

alt text

I use google-perftools to profile it on a 10000000 double array. It reports as follow

alt text

It seems I should not use STL list,but I wonder why? Which does std_List_node_base_M_hook do? Should I write list class myself?

PS:The experiment and improvement
I have tried just leave the codes of putting in buckets and this explained that most time is used on building up buckets.
The following improvement is made: - Use STL vector as buckets and reserve reasonable space for buckets - Use two helper array to store the information used in building buckets,thus avoiding the use of linked list,as in following code

void bucketsort2(Array& A){
  size_t    numBuckets = ceil(A.size()/1000);
  Array B(A.size());
  IndexArray    head(numBuckets+1,0),offset(numBuckets,0);//extra end of head is used to avoid checking of i == A.size()-1

  for(size_t i=0;i!=A.size();i++){
    head[int(numBuckets*A[i])+1]++;//Note the +1
  }
  for(size_t i=2;i<numBuckets;i++){//head[1] is right already
    head[i] += head[i-1];
  }

  for(size_t i=0;i<A.size();i++){
    size_t  bucket_num         = int(numBuckets*A[i]);
    B[head[bucket_num]+offset[bucket_num]] = A[i];
    offset[bucket_num]++;
  }
  A.swap(B);

  //insertionsort(A);
  for(size_t i=0;i<numBuckets;i++)
    quicksort_range(A,head[i],head[i]+offset[i]);
}

The result in the following graph alt text where line start with list using list as buckets,start with vector using vector as buckets,start 2 using helper arrays.By default insertion sort is used at last and some use quick sort as the bucket size is big.
Note "list" and "list,only put in" ,"vector,reserve 8" and "vector,reserve 2" nearly overlap.
I will try small size with enough memory reserved.

The O-bounds are defined asymptotically. In real life, there is always the constant factor to consider. — Peter G., Oct 17 '10 at 15:01
Shouldn't there be less buckets? Say `A.size() / some_const` ? Or a fixed number (10, 100)? — ruslik, Oct 17 '10 at 15:05
To Peter G.:That's almost true but I don't think so in this scene.I think the size is big enough and most importantly the increase of time is not O(n) but about O(n^1.26). — luoq, Oct 17 '10 at 15:06
I would in this case not write any sorting myself but use the quickest solution seen so far: STL's sort — Peter G., Oct 17 '10 at 15:08
It looks to me like the increase in time is linear for bucket sort and not quite linear for STL sort. — Puppy, Oct 17 '10 at 15:09
To Robert Harvey:I reputation is 9 just then and cannot post image.And now I'am working on make it inline.I am not familiar with the edit system. — luoq, Oct 17 '10 at 15:10
To ruslik :The run time is roughly the same when using less buckets,so I just make it n — luoq, Oct 17 '10 at 15:12
@luoq well, with `some_const == 1` it ceases to be bucket sort.. — ruslik, Oct 17 '10 at 15:14
To Peter G:A roughly regression shows stl sort is O(n^1.1) and bucket sort is O(n^1.26) — luoq, Oct 17 '10 at 15:14
To Peter G:This is homework for algorithm design so I must do it myself — luoq, Oct 17 '10 at 15:15
+1 just for the nice graph and visual context: wishing every question went to this much trouble — John K, Oct 18 '10 at 04:52

score 2 · Answer 1 · answered Oct 18 '10 at 03:00

2

In my opinion, the biggest bottleneck here is memory management functions (such as new and delete).

Quicksort (of which STL probably uses an optimized version) can sort an array in-place, meaning it requires absolutely no heap allocations. That is why it performs so well in practice.

Bucket sort relies on additional working space, which is assumed to be readily available in theory (i.e. memory allocation is assumed to take no time at all). In practice, memory allocation can take anywhere from (large) constant time to linear time in the size of memory requested (Windows, for example, will take time to zero the contents of pages when they are allocated). This means standard linked list implementations are going to suffer, and dominate the running time of your sort.

Try using a custom list implementation that pre-allocates memory for a large number of items, and you should see your sort running much faster.

answered Oct 18 '10 at 03:00

casablanca

69,683
7
133
150

I have tried using vector as buckets(use push_back pop_back,and reserve space for two double),the code runs faster than using list but putting in buckets also consume most time.The problem is that some bucket will have larger content .If pre-allocating that for each bucket will waste a lot of memory and time.And for now I donnot kown the distribution of size of the most large bucket. – luoq Oct 18 '10 at 04:57
1

Those are exactly the conditions that bucket sort requires to perform well: it assumes you have enough extra space readily available. – casablanca Oct 18 '10 at 05:26
And that is also the reason why bucket sort is impractical for large data sets. – casablanca Oct 18 '10 at 05:33

score 1 · Answer 2 · answered Oct 17 '10 at 15:08

1

Linked Lists are not arrays. They are substantially slower to perform operations like lookup. The STL sort may well have a specific version for lists that takes this into account and optimizes for it- but your function blindly ignores what container it's using. You should try using an STL vector as your array.

answered Oct 17 '10 at 15:08

Puppy

144,682
38
256
465

The STL list is used just as bucket and only push_back front and pop_front is used,which should takes const time.And the only sort is insertionsort(A) where A is actually array of double type. – luoq Oct 17 '10 at 15:20
@luoq: You are calling `size()` on the lists, which is O(N) rather than O(1). – Oliver Charlesworth Oct 17 '10 at 15:49
@Oli Charlesworth : should be O(1) in good implementation. Also this he has as much bucket as the number of elements in the array. Each list should be of length ~1 if he distribution is uniform. – Loïc Février Oct 17 '10 at 15:57
@Oli Charlesworth: The complexity of size is "Constant (recommended). Linear in some implementations.".So this may make the function slow.But even if it's O(n),the total extra time should be O(size of A).Whatever, I'll try not use size. – luoq Oct 17 '10 at 16:00
@luoq : you can use a `while(buckets[i].empty())` ;) – Loïc Février Oct 17 '10 at 16:02
Using while(!buckets[i].empty()) instead of size() the time is roughly the same.It seems a little faster with large n. – luoq Oct 17 '10 at 16:17
3

@Loic: O(1) `size()` conflicts with O(1) `splice()`, so maybe, maybe not. GNU implementation is an example with O(N) time (it calls `std::distance()`. – Oliver Charlesworth Oct 17 '10 at 16:22
@Oli Charlesworth : don't see any conflicts. Splice works with iterator, should not need size(). Where is the conflict ? Note that the third version of slice is not O(1). See http://www.cplusplus.com/reference/stl/list/splice/ – Loïc Février Oct 17 '10 at 16:32
@Loic: Precisely. Some implementers choose O(1) `size`, some choose O(1) `slice(iterator, x, first, last)`. MSVC went for the former, GNU for the latter. – Oliver Charlesworth Oct 17 '10 at 16:36
Alright. There was no indication that slice could be O(1) in some implementation. – Loïc Février Oct 17 '10 at 16:43
2

@Loïc Février: check out the SGI docs, they document function complexity. If you look at the "new members" section, all versions of `splice`, including the "range" version are documented as "This function is constant time.". To be constant time, one must not iterate over that range! See http://www.sgi.com/tech/stl/List.html – André Caron Oct 17 '10 at 16:57

Loïc Février · Accepted Answer · 2010-10-17T17:06:06.810

1

With

iarray<List> buckets(numBuckets);

you are basically creating a LOT of lists and that can cost you a lot especially in memory access which it theoretically linear but that's not the case in practice.

Try to reduce the number of buckets.

To verify my assertion analyse your code speed with only the creation of the lists.

Also to iterate over the elements of the lists you should not use .size() but rather

//get back from buckets
for(size_t i=0,head=0;i!=numBuckets;i++)
  while(!buckets[i].empty())
  {
    A[head++] = buckets[i].front();
    buckets[i].pop_front();
  }

In some implementations .size() can be in O(n). Unlikely but...

After some research I found this page explaining what is the code for std::_List_node_base::hook.

Seems it is only to insert an element at a given place in a list. Shouldn't cost a lot..

edited Oct 17 '10 at 17:06

answered Oct 17 '10 at 16:08

Loïc Février

7,540
8
39
51

size() seems constant in my enviroment(GCC),I'll try your first idea – luoq Oct 17 '10 at 16:19
Even if constant, the "correct" way to take all elements from a list is with empty/front/pop. – Loïc Février Oct 17 '10 at 16:33
It cannot explain that much time.I run the function with just initializing buckets(keep only first two lines),from n=1000 to 4096000 ,the runtime is 2%-5% of original. – luoq Oct 17 '10 at 16:35
Ok. Try that + "put in buckets" and not "get back". To see in which lines is the problem. – Loïc Février Oct 17 '10 at 16:37
I will try using array to store the size of buckets and direct copy the array to a new one according to the size information.Thus using of linked list is avoid.I will check the run time then. – luoq Oct 17 '10 at 16:39
@lucq : since we do not know what is std_List_node_base_M_hook, a simple thing to do would be to see where it is used. – Loïc Février Oct 17 '10 at 16:44
That's it.After adding "put in buckets" the time is in fact longer than doing the complete due to maybe some random factor. – luoq Oct 17 '10 at 16:54
@luoq : ok, but that does not explains why. No cache in memory access maybe ? Have you tried push_front ? Because pop() does not seems to take much time, it should cost you as much than push_back. – Loïc Février Oct 17 '10 at 17:09
It's std_List_node_base::_M_hook instead of std_List_node_base_M_hook,which is defined(or declared) at line 87 of /usr/include/c++/4.5.1/bits/stl_list.h with GCC 4.5.1. But I cannot understand that file. – luoq Oct 17 '10 at 17:10
See the update to my answer. You have the code of the function. Basically you have a double-linked list and that function update next/prev the elements when a new one is inserted. – Loïc Février Oct 17 '10 at 17:13
push_front dosen't make much difference.The differnce between push and pop may be that push requires allocation of memory which somewhat does not takes linear time.I will try use vector as bucker and reserve some space fot each. – luoq Oct 18 '10 at 00:14
Using vector as buckets make the code faster ,it seems O(n) now – luoq Oct 18 '10 at 00:40
@luoq: that's normal, `list` is the slowest container in the STL. It's only useful for its iterator invalidation property (and for time to time the `slice` operation). Don't use it if possible. – Matthieu M. Oct 18 '10 at 07:23

score 1 · Answer 4 · edited Oct 18 '10 at 03:07

I think perhaps the interesting question is, Why are you creating an inordinately large number of buckets?

Consider the input {1,2,3}, numBuckets = 3. The loop containing buckets[int(numBuckets*A[i])].push_back(A[i]); is going to unroll to

buckets[3].push_back(1);  
buckets[6].push_back(2);  
buckets[9].push_back(3);

Really? Nine buckets for three values...

Consider if you passed a permutation of the range 1..100. You'd create 10,000 buckets and only use 1% of them. ... and each of those unused buckets requires creating a List in it. ... and has to be iterated over and then discarded in the readout loop.

Even more exciting, sort the list 1..70000 and watch your heap manager explode trying to create 4.9 billion Lists.

The content of array are generate randomly in [0,1).So just numBuckets buckets are created and all element can be put in. — luoq, Oct 18 '10 at 04:47

score 0 · Answer 5 · answered Aug 31 '11 at 22:09

I didnt really manage to get to the details of your code, as i dont know enough of Java at this point of my study, tho i have had some exprience in algorithms and C programming so here's my opinion:

Bucket Sort assuming fair distrabution of the Elements on the array, thats actually more like a condition for your bucket sort to work on O(n), notice in worst case, it can be that you put major amount of elements on 1 of your buckets, thus in the next iteration youre gonna deal with almost the same problem as you been trying to fix in the first place which leads you to a bad performance.

Notice that the ACTUALL Time Complexity of Bucket sort is O(n+k) where k is the number of buckets, did you count your buckets? is k=O(n)?

the most time wasting problem in bucket sort is the Empty buckets after the partition to buckets is over with, when concatenate your sorted buckets you cant tell if the bucket's empty without actually testing it.

hope i helped.

What makes this bucket sort function slow?

5 Answers5