I needed to sort an array of large sized objects and it got me thinking: could there be a way to minimize the number of swaps?
So I used quicksort (but any other fast sort should work here too) to sort indices to the elements in the array; indices are cheap to swap. Then I used those indices to swap the actual objects into their places. Unfortunately this uses O(n) additional space to store the indices. The code below illustrates the algorithm (which I'm calling IndexSort), and in my tests, appears to be faster than plain quicksort for arrays of large sized objects.
template <class Itr>
void IndexSort(Itr begin, Itr end)
{
const size_t count = end - begin;
// Create indices
vector<size_t> ind(count);
iota(ind.begin(), ind.end(), 0);
// Sort indices
sort(ind.begin(), ind.end(), [&begin] (const size_t i, const size_t j)
{
return begin[i] < begin[j];
});
// Create indices to indices. This provides
// constant time search in the next step.
vector<size_t> ind2(count);
for(size_t i = 0; i < count; ++i)
ind2[ind[i]] = i;
// Swap the objects into their final places
for(size_t i = 0; i < count; ++i)
{
if( ind[i] == i )
continue;
swap(begin[i], begin[ind[i]]);
const size_t j = ind[i];
swap(ind[i], ind[ind2[i]]);
swap(ind2[i], ind2[j]);
}
}
Now I have measured the swaps (of the large sized objects) done by both, quicksort, and IndexSort, and found that quicksort does a far greater number of swaps. So I know why IndexSort could be faster.
But can anyone with a more academic background explain why/how does this algorithm actually work? (it's not intuitive to me, although I somehow came up with it).
Thanks!
Edit: The following code was used to verify the results of IndexSort
// A class whose objects will be large
struct A
{
int id;
char data[1024];
// Use the id to compare less than ordering (for simplicity)
bool operator < (const A &other) const
{
return id < other.id;
}
// Copy assign all data from another object
void operator = (const A &other)
{
memcpy(this, &other, sizeof(A));
}
};
int main()
{
const size_t arrSize = 1000000;
// Create an array of objects to be sorted
vector<A> randArray(arrSize);
for( auto &item: randArray )
item.id = rand();
// arr1 will be sorted using quicksort
vector<A> arr1(arrSize);
copy(randArray.begin(), randArray.end(), arr1.begin());
// arr2 will be sorted using IndexSort
vector<A> arr2(arrSize);
copy(randArray.begin(), randArray.end(), arr2.begin());
{
// Measure time for this
sort(arr1.begin(), arr1.end());
}
{
// Measure time for this
IndexSort(arr2.begin(), arr2.end());
}
// Check if IndexSort yielded the same result as quicksort
if( memcmp(arr1.data(), arr2.data(), sizeof(A) * arr1.size()) != 0 )
cout << "sort failed" << endl;
return 0;
}
Edit: Made the test less pathological; reduced the size of the large object class to just 1024 bytes (plus one int), and increased the number of objects to be sorted to one million. This still results in IndexSort being significantly faster than quicksort.
Edit: This requires more testing for sure. But it makes me think, what if std::sort could, at compile time, check the object size, and (depending on some size threshold) choose either the existing quicksort implemenation or this IndexSort implementation.
Also, IndexSort could be described as an "in-place tag sort" (see samgak's answer and my comments below).