3

I needed to sort an array of large sized objects and it got me thinking: could there be a way to minimize the number of swaps?

So I used quicksort (but any other fast sort should work here too) to sort indices to the elements in the array; indices are cheap to swap. Then I used those indices to swap the actual objects into their places. Unfortunately this uses O(n) additional space to store the indices. The code below illustrates the algorithm (which I'm calling IndexSort), and in my tests, appears to be faster than plain quicksort for arrays of large sized objects.

template <class Itr>
void IndexSort(Itr begin, Itr end)
{
    const size_t count = end - begin;

    // Create indices
    vector<size_t> ind(count);
    iota(ind.begin(), ind.end(), 0);

    // Sort indices
    sort(ind.begin(), ind.end(), [&begin] (const size_t i, const size_t j)
    {
        return begin[i] < begin[j];
    });

    // Create indices to indices. This provides
    // constant time search in the next step.
    vector<size_t> ind2(count);
    for(size_t i = 0; i < count; ++i)
        ind2[ind[i]] = i;

    // Swap the objects into their final places
    for(size_t i = 0; i < count; ++i)
    {
        if( ind[i] == i )
            continue;

        swap(begin[i], begin[ind[i]]);

        const size_t j = ind[i];

        swap(ind[i], ind[ind2[i]]);
        swap(ind2[i], ind2[j]);
    }
}

Now I have measured the swaps (of the large sized objects) done by both, quicksort, and IndexSort, and found that quicksort does a far greater number of swaps. So I know why IndexSort could be faster.

But can anyone with a more academic background explain why/how does this algorithm actually work? (it's not intuitive to me, although I somehow came up with it).

Thanks!

Edit: The following code was used to verify the results of IndexSort

// A class whose objects will be large
struct A
{
    int id;
    char data[1024];

    // Use the id to compare less than ordering (for simplicity)
    bool operator < (const A &other) const
    {
        return id < other.id;
    }

    // Copy assign all data from another object
    void operator = (const A &other)
    {
        memcpy(this, &other, sizeof(A));
    }
};

int main()
{
    const size_t arrSize = 1000000;

    // Create an array of objects to be sorted
    vector<A> randArray(arrSize);
    for( auto &item: randArray )
        item.id = rand();

    // arr1 will be sorted using quicksort
    vector<A> arr1(arrSize);
    copy(randArray.begin(), randArray.end(), arr1.begin());

    // arr2 will be sorted using IndexSort
    vector<A> arr2(arrSize);
    copy(randArray.begin(), randArray.end(), arr2.begin());

    {
        // Measure time for this
        sort(arr1.begin(), arr1.end());
    }

    {
        // Measure time for this
        IndexSort(arr2.begin(), arr2.end());
    }

    // Check if IndexSort yielded the same result as quicksort
    if( memcmp(arr1.data(), arr2.data(), sizeof(A) * arr1.size()) != 0 )
        cout << "sort failed" << endl;

    return 0;
}

Edit: Made the test less pathological; reduced the size of the large object class to just 1024 bytes (plus one int), and increased the number of objects to be sorted to one million. This still results in IndexSort being significantly faster than quicksort.

Edit: This requires more testing for sure. But it makes me think, what if std::sort could, at compile time, check the object size, and (depending on some size threshold) choose either the existing quicksort implemenation or this IndexSort implementation.

Also, IndexSort could be described as an "in-place tag sort" (see samgak's answer and my comments below).

Community
  • 1
  • 1
Francis Xavier
  • 194
  • 1
  • 11
  • Note: Reordering a vector (with sorted indices) is nontrivial. See [Reorder vector using a vector of indices](http://stackoverflow.com/q/838384/153285). – Potatoswatter May 06 '15 at 05:39
  • Just for the record: You could also use a container where swapping isn't that expensive, i.e. a doubly-linked list. – Ulrich Eckhardt May 06 '15 at 05:48

2 Answers2

5

It seems to be a tag sort:

For example, the popular recursive quicksort algorithm provides quite reasonable performance with adequate RAM, but due to the recursive way that it copies portions of the array it becomes much less practical when the array does not fit in RAM, because it may cause a number of slow copy or move operations to and from disk. In that scenario, another algorithm may be preferable even if it requires more total comparisons.

One way to work around this problem, which works well when complex records (such as in a relational database) are being sorted by a relatively small key field, is to create an index into the array and then sort the index, rather than the entire array. (A sorted version of the entire array can then be produced with one pass, reading from the index, but often even that is unnecessary, as having the sorted index is adequate.) Because the index is much smaller than the entire array, it may fit easily in memory where the entire array would not, effectively eliminating the disk-swapping problem. This procedure is sometimes called "tag sort".

As described above, tag sort can be used to sort a large array of data that cannot fit into memory. However even when it can fit in memory, it still requires less memory read-write operations for arrays of large objects, as illustrated by your solution, because entire objects are not being copied each time.

Implementation detail: while your implementation sorts just the indices, and refers back to the original array of objects via the index when doing comparisons, another way of implementing it is to store index/sort key pairs in the sort buffer, using the sort keys for comparisons. This means that you can do the sort without having the entire array of objects in memory at once.

One example of a tag sort is the LINQ to Objects sorting algorithm in .NET:

The sort is somewhat flexible in that it lets you supply a comparison delegate. It does not, however, let you supply a swap delegate. That’s okay in many cases. However, if you’re sorting large structures (value types), or if you want to do an indirect sort (often referred to as a tag sort), a swap delegate is a very useful thing to have. The LINQ to Objects sorting algorithm, for example uses a tag sort internally. You can verify that by examining the source, which is available in the .NET Reference Source. Letting you pass a swap delegate would make the thing much more flexible.

Community
  • 1
  • 1
samgak
  • 23,944
  • 4
  • 60
  • 82
  • Thank you, it does appear to be a tag sort according to that definition. Are there any other open implementations of a tag sort? – Francis Xavier May 06 '15 at 04:33
  • 1
    MySQL uses a kind of tag sort, see the description here: http://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html (search for "The original filesort algorithm works as follows"). Note that it sorts key/id pairs, rather than entire records in the sort buffer. – samgak May 06 '15 at 04:49
  • As far as I understood from that page, that algorithm appears to be a bit different. No optimal swapping of large objects was required there; the objects were simply copied into a new buffer in sorted order. That would require an additional O(n) memory as big as the original array. – Francis Xavier May 06 '15 at 05:11
  • It's not identical but it includes the basic properties of a tag sort because the sort buffer only contains the row index/id and sort key (rather the entire database records, i.e. large objects) while the sort is being performed. It's kind of obfuscated by the fact that several other optimizations are being described there at the same time, and yes the object are copied sorted into a new array at the end (because they are being returned from a query) rather than swapped in place in the existing array. If that is a bad example, refer to the .NET tag sort mentioned in the answer. – samgak May 06 '15 at 05:23
  • I think the "interesting" part of IndexSort is the part where the actual (large) objects are swapped into their final places. If one replaces that mechanism with simply copying the objects into a new buffer, then it's only a solution when ample space is available (and optimizing space utilization is unnecessary). – Francis Xavier May 06 '15 at 06:03
  • Maybe the IndexSort implementation described above is a variant of tag sort which optimizes space utilization. From the above definition of tag sort, it does not seem to concern itself with how the final sorted array is created, whereas IndexSort deals with optimizing exactly that. Maybe it can be called an "in-place tag sort" (similar to how there exists an in-place version of merge sort). – Francis Xavier May 06 '15 at 06:15
  • Yes, that sounds like what it is. On the flip side, if you want to optimize sorting speed at the expense of space and slightly slower array access, you could store the sorted index array with the object array, eliminating the final object swapping altogether, and do all accesses of the array elements via the sorted index from then on. This is more or less what databases do with indexed tables. – samgak May 06 '15 at 06:41
4

I wouldn't exactly call that an algorithm so much as an indirection.

The reason you're doing fewer swaps of the larger objects is because you have the sorted indices (the final result, implying no redundant intermediary swaps). If you counted the number of index swaps in addition to object swaps, then you'd get more swaps total with your index sorting.

Nevertheless, you're not necessarily bound by algorithmic complexity all the time. Spending the expensive sorting time swapping cheap little indices around saves more time than it costs.

So you have a higher number of total swaps with the index sort, but the bulk of them are cheaper and you're doing far fewer of the expensive swaps of the original object.

The reason it's faster is because your original objects are larger than indices but perhaps inappropriate for a move constructor (not necessarily storing dynamically-allocated data).

At this level, the cost of the swap is going to be bound more by the structure size of the elements you're sorting, and this will be practical efficiency rather than theoretical algorithmic complexity. And if you get into the hardware details, that's going to boil down to things like more fitting in a cache line.

With sorting, the amount of computation done over the same data set is substantial. We're doing at optimal O(NLogN) compares and swaps, often more in practice. So when you use indices, you make both the swapping and comparison potentially cheaper (in your case, just the swapping since you're still using a comparator predicate to compare the original objects).

Put another way, std::sort is O(NLogN). Your index sort is O(N+NLogN). Yet you're making the bigger NLogN work much cheaper using indices and an indirection.

In your updated test case, you're using a very pathological case of enormous objects. So your index sorting is going to pay off big time there. More commonly, you don't have objects of type T where sizeof(T) spans 100 kilobytes. Typically if an object stores data of such size, it's going to store a pointer to it elsewhere and a move constructor to simply shallow copy the pointers (making it about as cheap to swap as an int). So most of the time you won't necessarily get such a big pay off sorting things indirectly this way, but if you do have enormous objects like that, this kind of index or pointer sort will be a great optimization.

Edit: This requires more testing for sure. But it makes me think, what if std::sort could, at compile time, check the object size, and (depending on some size threshold) choose either the existing quicksort implemenation or this IndexSort implementation.

I think that's not a bad idea. At least making it available might be a nice start. Yet I would suggest against the automatic approach. The reason I think it might be better to leave that to the side as a potential optimization the developer can opt into when appropriate is because there are sometimes cases where memory is more valuable than processing. The indices are going to seem trivial if you create like 1 kilobyte objects, but there are a lot of iffy scenarios, borderline cases, where you might be dealing with something more like 32-64 bytes (ex: a list of 32-byte, 4-component double-precision mathematical vectors). In those borderline cases, this index sort method may still be faster, but the extra temporary memory usage of 2 extra indices per element may actually become a factor (and may occasional cause a slowdown at runtime depending on the physical state of the environment). Consider that attempt to specialize cases with vector<bool> -- it often creates more harm than good. At the time it seemed like a great idea to treat vector<bool> as a bitset, now it often gets in the way. So I'd suggest leaving it to the side and letting people opt into it, but having it available might be a welcome addition.

  • With indices, the comparison remains the same and the swapping _of the indices_ becomes cheaper, sure. However, the _actual_ objects still need to be swapped into place. Why this mechanism requires less final object swaps is what is confusing me. I was under the impression that quicksort was already optimal with regards to number of swaps required. – Francis Xavier May 06 '15 at 04:38
  • 1
    With quicksort, you're often swapping the same element more than once. Even an optimal one applied over a million elements would do O(NLogN) swaps, so optimally 20 million swaps. It's because the algorithm has intermediary steps, so there's not necessarily one swap per element. So you still have to do that many swaps on indices, but index swaps are much cheaper. Then your final loop to swap objects in place is just O(n) -- one swap per element, and two swaps per index. –  May 06 '15 at 04:44
  • 1
    Put crudely, you're doing less final swaps on the expensive objects to swap with yours because the sorted indices give you the answer already. Quicksort doesn't have the final sorted answer, and it needs to do more swaps to get there, treating the original container you passed in as an intermediary to compute results in multiple passes. –  May 06 '15 at 04:46
  • If you think of the quicksort algorithm, each phase is swapping elements to put the lesser elements to the left of the pivot and the greater elements to the right. The same elements are going to be swapped many times over potentially as you then recurse and choose new sub-pivots and sub-sub-pivots. So it's a very optimal algorithm in terms of algorithmic sorting complexity, but nowhere near as optimal as O(n) in terms of swaps if you sort indices in advance and use an algorithm that already has the final sorted answer on your bulky objects. –  May 06 '15 at 04:50
  • 1
    Thanks for your comments, I've edited the test code to now sort one million objects, each object being about 1kb in size. This is no longer pathological and could actually be a common scenario! – Francis Xavier May 06 '15 at 05:38
  • @FrancisXavier Nice -- even for 1kb objects, you should get a nice improvement sorting just ints (typically 4 bytes) around. –  May 06 '15 at 05:43