3

Operation A

I have N vectors, each containing certain number of unique 3D points. For Example : std::vector<double*> vec1; and like that

I am performing sort operation on each of the vector like:

 std::sort(vec1.begin(), vec1.end(), sortCriteria());
 std::sort(vec2.begin(), vec2.end(), sortCriteria());
 std::sort(vec3.begin(), vec3.end(), sortCriteria());

Operation B

Suppose I have a vector called "all_point_vector" which holds the 3D points from vec1, vec2, vec3 ...

i.e. 3D points in all_point_vector = points_in_vec1 +.... +points_in_vector3.

and I am performing the sort operation:

std::sort(all_point_vec.begin(), all_point_vec.end(), sortCriteria());

My question is , which of the above methods (Operation A or B) will be faster in general? sorting a single vector (all_point_vector) or sorting individual vectors. I am just interested in the speed of execution of these two operations.

Thanks

memC
  • 1,005
  • 2
  • 14
  • 23
  • But with "Operation A" you still need another step to merge the three vectors, right? Otherwise you're comparing two algorithms that have very different results. – Manuel Feb 19 '10 at 12:02
  • No, let's assume that I don't need to merge the results – memC Feb 19 '10 at 12:11
  • Then I guess it's logical that "Operation B" will be slower as it has to do more work. – Manuel Feb 19 '10 at 12:19

3 Answers3

4

Sorting is an O(n log n) operation. Sorting N vectors with m/N elements will become strictly faster than sorting a single vector of m elements as you increase m.

Which one is faster for any fixed m can only be determined by profiling.

avakar
  • 32,009
  • 9
  • 68
  • 103
  • @avakar: Thank you, I was expecting thinking on similar lines but wasn't sure... I will wait for more votes/ answers before accepting any answer :-) – memC Feb 19 '10 at 11:56
  • Notice that operation A will only yield a correct result if the sort criteria cause all points in vec1 to be before the ones in vec2 (and those of vec2 to come before those of vec3). If that is the case, the first operation will be faster and yield the correct result. Notice that if you would start from one big vector and then split it in two separate vectors (one for 'small' points, one for 'big' points) and then sort these vectors, that you are actually performing one phase of a quicksort. – Patrick Feb 19 '10 at 12:06
  • @Patrrick: you're making unfounded assumption about the business rules. – MSalters Feb 19 '10 at 12:28
  • It's faster because it's solving a different (simpler) problem. – Mike Dunlavey Feb 19 '10 at 14:21
3

What avakar said, in theory sorting a few short vectors should be faster than sorting the whole, in practice - you should measure. I'd just like to show some more math:

Let there be k sequences and the i-th sequence has ni number of elements. Let the total number of elements be N = n1 + ... + nk. Sorting the individual sequences has complexity O(n1logn1 + ... + nklognk). Sorting the big sequence has complexity O(N logN) = O((n1 + ... + nk)logN) = O(n1logN + ... + nklogN). Now we have to compare

A = n1logn1 + ... + nklognk

B = n1logN + ... + nklogN

Since N > ni for all i, logN > logni for all i. Therefore, B is strictly larger than A, i.e. sorting the entire sequence will take more time.

sbk
  • 9,212
  • 4
  • 32
  • 40
  • 1
    In particular, if the vectors are all about the same size, namely `N/k`, then your algorithm goes from `O(N log N)` to `O(N log N - N log k)`. (Easy to verify by noting `log(N/k) = log N - log k`.) – Rex Kerr Feb 19 '10 at 19:28
1

Sorting a single array of m elements is a different problem from sorting the same number of elements divided into N arrays, because in the divided-case, you still don't have a total order of all the elements.

Assuming m = 1024, in the singleton case, m log m = 1024*10 = 10240.

If N=2 you have 512*9 + 512*9 = 9216, but you still have to do a merge step of 1024 comparisons, and 9216 + 1024 = 10240, so it's the same.

[Actually, at each level of the sorting, the number of comparisons is 1 less than the number of items to merge, but the overall result is still O(n log n)]

ADDED: If, as you commented, you don't need to do the merge, then the divided case is faster. (Of course, in that case, you could divide the m items into N=m arrays and not even bother sorting ;-)

Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • @Mike: Yes, I don't need to merge these elements together. Having the simple vectors like in Operation A is going to make a better code design. But, I wanted check if , by doing this , I am compromising the performance. Looks like it will in fact giving a better performance in theory. So the problem resolved. Operation a is the winner. Thanks! – memC Feb 19 '10 at 15:36
  • @memC: It's better performance, but only marginal. I would encourage you to try this: http://stackoverflow.com/questions/926266/performance-optimization-strategies-of-last-resort/927773#927773 – Mike Dunlavey Feb 19 '10 at 17:32