How to optimize this Radix-Sort algorithm in C++?

Question

I am working with this assignment of optimizing a radix sort code in C++ and I need to reduce the execution time, my code is working and it looks like this:

void RadixSort::RadixSortNaive(std::vector<long> &Arr) {

long Max_Value = findMax(Arr);

    int Max_Radix = 1;
while (1) {
  if (Max_Radix >= Max_Value) break;
  Max_Radix = Max_Radix*radix_;
}

for (int i = 1; i < Max_Radix; i = i*radix_) {
  for (int j = 0; j < key_length_; j++) {
    int K;
    if (Arr[j] < i) K = 0;
    else K = (Arr[j] / i) % radix_;
    Q[K].push(Arr[j]);
  }

  int idx = 0;
  for (int j = 0; j < radix_; j++) {
    while (Q[j].empty() == 0) {
      Arr[idx] = Q[j].front();
      Q[j].pop();
      idx++;
    }
  }
}
class RadixSort{
public :

  void setConfig(int key_length, int radix) {
    key_length_ = key_length;
    radix_ = radix;
    for (int i = 0; i < radix_; i++) {
      Q.push_back(std::queue<long>());
    }
  }

  long findMax(std::vector<long> Arr) const {
    long Max = 0;
    for (int i = 0; i < key_length_; i++) {
      if (Max < Arr[i])
        Max = Arr[i];
    }
    return Max;
  }

  void RadixSortNaive(std::vector<long> &Arr);
  void RadixSortStudent(std::vector<long> &Arr);

private:
  int key_length_;
  int radix_;
  std::vector<std::queue<long>> Q;
};
}

However, I am sure that there is still room for improvement. I have been trying to implement parallelization with OMP library but nothings seems to work. Is there any way where I can improve the previous code? Maybe improving the loops or any other code optimization technique.

Example code for [multi-threaded radix sort](https://stackoverflow.com/questions/68875534/68876177#68876177), but it uses Windows native threads. You could use this as a baseline benchmark. It sorts 64 bit unsigned integers, and doesn't take the max value into account, which would help reduce number of passes, but since they are in parallel, it may not make that much difference. — rcgldr, Dec 19 '21 at 03:10

score 1 · Answer 1 · answered Dec 15 '21 at 08:38

As suggested in the comments, first thing is to get the API right.

findMax can be replaced by std::max_element( ), which uses iterators, and doesn't make a copy of the input.

Other suspicious thing is Q[K].push(Arr[j]);. If memory so permits, at least reserve the maximum amount of elements in each queue -- otherwise the queues need to copy old data when resizing.

Then if possible, using raw pointers with no out of range check, you can push() and pop() with auto popped = *tail++ and *head++ = new_element; My observation is that while STL is correctly implemented and is fast to develop with, the support of dynamic memory allocation in insertions practically always degrades performance compared to known static allocations.

Third thing is to specialise the radix for powers of two, since now the division is strength reduced to shift, and the modulus is strength reduced to logical and (by some constants, which need to be calculated).

Especially when radix is a power of two, and possibly otherwise too, I guess it's not useful to calculate K==0 conditionally: if (Arr[j] < i) K = 0;.

How to optimize this Radix-Sort algorithm in C++?

1 Answers1