performance: sorting 'm' vectors with N/m elems Vs sorting single vector with N elements

Question

Operation A

I have N vectors, each containing certain number of unique 3D points. For Example : std::vector<double*> vec1; and like that

I am performing sort operation on each of the vector like:

 std::sort(vec1.begin(), vec1.end(), sortCriteria());
 std::sort(vec2.begin(), vec2.end(), sortCriteria());
 std::sort(vec3.begin(), vec3.end(), sortCriteria());

Operation B

Suppose I have a vector called "all_point_vector" which holds the 3D points from vec1, vec2, vec3 ...

i.e. 3D points in all_point_vector = points_in_vec1 +.... +points_in_vector3.

and I am performing the sort operation:

std::sort(all_point_vec.begin(), all_point_vec.end(), sortCriteria());

My question is , which of the above methods (Operation A or B) will be faster in general? sorting a single vector (all_point_vector) or sorting individual vectors. I am just interested in the speed of execution of these two operations.

Thanks

But with "Operation A" you still need another step to merge the three vectors, right? Otherwise you're comparing two algorithms that have very different results. — Manuel, Feb 19 '10 at 12:02
Then I guess it's logical that "Operation B" will be slower as it has to do more work. — Manuel, Feb 19 '10 at 12:19

avakar · Accepted Answer · 2010-02-19T11:58:06.497

4

Sorting is an O(n log n) operation. Sorting N vectors with m/N elements will become strictly faster than sorting a single vector of m elements as you increase m.

Which one is faster for any fixed m can only be determined by profiling.

edited Feb 19 '10 at 11:58

answered Feb 19 '10 at 11:52

avakar

32,009
9
68
103

@avakar: Thank you, I was expecting thinking on similar lines but wasn't sure... I will wait for more votes/ answers before accepting any answer :-) – memC Feb 19 '10 at 11:56
Notice that operation A will only yield a correct result if the sort criteria cause all points in vec1 to be before the ones in vec2 (and those of vec2 to come before those of vec3). If that is the case, the first operation will be faster and yield the correct result. Notice that if you would start from one big vector and then split it in two separate vectors (one for 'small' points, one for 'big' points) and then sort these vectors, that you are actually performing one phase of a quicksort. – Patrick Feb 19 '10 at 12:06
@Patrrick: you're making unfounded assumption about the business rules. – MSalters Feb 19 '10 at 12:28
It's faster because it's solving a different (simpler) problem. – Mike Dunlavey Feb 19 '10 at 14:21

score 3 · Answer 2 · answered Feb 19 '10 at 13:03

What avakar said, in theory sorting a few short vectors should be faster than sorting the whole, in practice - you should measure. I'd just like to show some more math:

Let there be k sequences and the i-th sequence has n_i number of elements. Let the total number of elements be N = n₁ + ... + n_k. Sorting the individual sequences has complexity O(n₁logn₁ + ... + n_klogn_k). Sorting the big sequence has complexity O(N logN) = O((n₁ + ... + n_k)logN) = O(n₁logN + ... + n_klogN). Now we have to compare

A = n₁logn₁ + ... + n_klogn_k

B = n₁logN + ... + n_klogN

Since N > n_i for all i, logN > logn_i for all i. Therefore, B is strictly larger than A, i.e. sorting the entire sequence will take more time.

In particular, if the vectors are all about the same size, namely `N/k`, then your algorithm goes from `O(N log N)` to `O(N log N - N log k)`. (Easy to verify by noting `log(N/k) = log N - log k`.) — Rex Kerr, Feb 19 '10 at 19:28

Mike Dunlavey · Answer 3 · 2010-02-19T14:27:50.460

1

Sorting a single array of m elements is a different problem from sorting the same number of elements divided into N arrays, because in the divided-case, you still don't have a total order of all the elements.

Assuming m = 1024, in the singleton case, m log m = 1024*10 = 10240.

If N=2 you have 512*9 + 512*9 = 9216, but you still have to do a merge step of 1024 comparisons, and 9216 + 1024 = 10240, so it's the same.

[Actually, at each level of the sorting, the number of comparisons is 1 less than the number of items to merge, but the overall result is still O(n log n)]

ADDED: If, as you commented, you don't need to do the merge, then the divided case is faster. (Of course, in that case, you could divide the m items into N=m arrays and not even bother sorting ;-)

edited Feb 19 '10 at 14:27

answered Feb 19 '10 at 13:35

Mike Dunlavey

40,059
14
91
135

@Mike: Yes, I don't need to merge these elements together. Having the simple vectors like in Operation A is going to make a better code design. But, I wanted check if , by doing this , I am compromising the performance. Looks like it will in fact giving a better performance in theory. So the problem resolved. Operation a is the winner. Thanks! – memC Feb 19 '10 at 15:36
@memC: It's better performance, but only marginal. I would encourage you to try this: http://stackoverflow.com/questions/926266/performance-optimization-strategies-of-last-resort/927773#927773 – Mike Dunlavey Feb 19 '10 at 17:32

performance: sorting 'm' vectors with N/m elems Vs sorting single vector with N elements

3 Answers3