The search for efficient ways to sort using as many simultaneous comparisons as possible is done with "sorting networks", Wikipedia has a nice article about it Sorting Networks.
their sequence of comparisons is set in advance, regardless of the
outcome of previous comparisons. In order to sort larger amounts of
inputs, new sorting networks must be constructed. This independence of
comparison sequences is useful for parallel execution
Here is the pseudocode for sorting networks for arrays from n=3 to n=7.
Compare(v, i, j) {
if (v[j] < v[i]) Swap(v, i, j);
}
SortingNetwork3(v) {
Compare(v, 0, 2);
Compare(v, 0, 1);
Compare(v, 1, 2);
}
SortingNetwork4(v) {
Compare(v, 0, 2); Compare(v, 1, 3);
Compare(v, 0, 1); Compare(v, 2, 3);
Compare(v, 1, 2);
}
SortingNetwork5(v) {
Compare(v, 0, 3); Compare(v, 1, 4);
Compare(v, 0, 2); Compare(v, 1, 3);
Compare(v, 0, 1); Compare(v, 2, 4);
Compare(v, 1, 2); Compare(v, 3, 4);
Compare(v, 2, 3);
}
SortingNetwork6(v) {
Compare(v, 0, 5); Compare(v, 1, 3); Compare(v, 2, 4);
Compare(v, 1, 2); Compare(v, 3, 4);
Compare(v, 0, 3); Compare(v, 2, 5);
Compare(v, 0, 1); Compare(v, 2, 3); Compare(v, 4, 5);
Compare(v, 1, 2); Compare(v, 3, 4);
}
SortingNetwork7(v) {
Compare(v, 0, 6); Compare(v, 2, 3); Compare(v, 4, 5);
Compare(v, 0, 2); Compare(v, 1, 4); Compare(v, 3, 6);
Compare(v, 0, 1); Compare(v, 2, 5); Compare(v, 3, 4);
Compare(v, 1, 2); Compare(v, 4, 6);
Compare(v, 2, 3); Compare(v, 4, 5);
Compare(v, 1, 2); Compare(v, 3, 4); Compare(v, 5, 6);
}
Comparisons on the same row (step) can be done simultaneously, using multithreading, vectorization or specialized hardware.
The theoretical limit for sorting n elements is O(log(n)) number of steps, with n sufficient big, but to construct an optimal sorting network is a NP problem, there are algoritms to create sorting networks with O(log^2(n)) steps. Currently there are known optimal sorting networks for n up to 17. Note that the number of comparisons is still O(n log(n)).