theory
Actually due to the data limitations (integer of maximum value 10^6) the most efficient is to use count sort (bucket sort) due to its linear complexity, not the sorted
or .sort
(which are O(nlogn))
Obviously, it is true for big amount of numbers (of similar row of magnitude). if t
is much smaller than the maximum value, than it (empirically) be better to use the O(nlogn) approach.
So:
- from the complexity point of view: it is preferable to use count sort, which is O(max(M,n)) where M is the maximum value and n is the number of test cases
- for small test cases it may be better to use the built-in
sorted
which is O(nlogn) where n is the number of samples
evaluation
I performed very simple testing on random arrays, showing how (naively implemented) countsort performs as compared to heavily optimised Timsort on bounded data
from timeit import Timer as T
import random
def timsort(x):
return sorted(x)
def countsort(x):
m = max(x)
buckets = [0] * (m+1)
for a in x: buckets[a] += 1
return [ b for c in ( [a] * buckets[a] for a in xrange(m+1) ) for b in c ]
for t,n in [ (1000,1000), (1000,10000), (10000,10000), (10000,100000), (100000,100000), (100000,1000000) ]:
print 't=',t,'n=',n
X = [ random.randint(0,t) for _ in xrange(n) ]
t = T( lambda: countsort(X) )
print 'count',t.timeit(number=1000)
t = T( lambda: timsort(X) )
print 'timso',t.timeit(number=1000)
results:
t= 1000 n= 1000
count 0.759298086166
timso 0.296448945999
t= 1000 n= 10000
count 2.71147489548
timso 3.95411610603
t= 10000 n= 10000
count 7.57060909271
timso 4.03612089157
t= 10000 n= 100000
count 28.043751955
timso 59.6779661179
t= 100000 n= 100000
count 78.1641709805
timso 58.4075620174
t= 100000 n= 1000000
count 286.381267071
timso 1023.59388494
So it seems, that if n>10t
(where t
is a maximum value) then countsort is much faster. Otherwise, better implementation of Timsort outperforms better complexity.