Is there a more efficient way to sort a list than list.sort() in Python?

Question

I'm trying to solve a challenge on Codechef.com to sort a large list within a certain time limit. I've tried using the builtin list.sort() and my own function, but when I upload the code it always shows "Time limit exceeded". Is there more efficient way than using list.sort() in terms of runtime?

Question:

Given the list of numbers, you are to sort them in non decreasing order.
Input
- t – the number of numbers in list, then t lines follow [t <= 10^6].
- Each line contains one integer: N [0 <= N <= 10^6]
Output
- Output given numbers in non decreasing order.

My answer:

test_cases = int(input())
store = []
count = 0
while count < test_cases:
    i = int(input())
    store.append(i)
    count += 1
store.sort()
for each_item in store:
    print(each_item)

possible duplicate of [Best way to sort 1M records in Python](http://stackoverflow.com/questions/1180240/best-way-to-sort-1m-records-in-python) or look for instance at http://neopythonic.blogspot.de/2008/10/sorting-million-32-bit-integers-in-2mb.html — fuesika, Feb 16 '14 at 11:58
Are you sure that the sorting is the problem? From the little information I have, it seems more likely you're just generating too much data to properly process in the time limit, or exceed the time limit before you even reach the sorting step. — , Feb 16 '14 at 11:59
@delnan i am not sure what the problem exactly is ., but i need a program which does this in least time. Thanks for all your quick responses — Pruthvi Raj, Feb 16 '14 at 12:01
If you use the term *efficient*, please define what you mean by it. Efficient in terms of computation time, memory usage, disk usage, ...? — Lukas Graf, Feb 16 '14 at 12:02
@user2968894 Then sorting is not the issue here. You should add the problem description and your attempt in the question body. — Ashwini Chaudhary, Feb 16 '14 at 12:03
dont use python sort for a very large number of integer, Python sort is called tim sort, its a fusion version or merge sort and insertion sort. So, i think the best for you is radix sort. Check this http://www.geekviewpoint.com/python/sorting/radixsort — James Sapam, Feb 16 '14 at 12:19
possible duplicate of [Turbo sort - Time Limit Exceeded](http://stackoverflow.com/questions/17430539/turbo-sort-time-limit-exceeded) — Ashwini Chaudhary, Feb 16 '14 at 12:21

lejlot · Accepted Answer · 2014-02-18T14:53:24.420

theory

Actually due to the data limitations (integer of maximum value 10^6) the most efficient is to use count sort (bucket sort) due to its linear complexity, not the sorted or .sort (which are O(nlogn))

Obviously, it is true for big amount of numbers (of similar row of magnitude). if t is much smaller than the maximum value, than it (empirically) be better to use the O(nlogn) approach.

So:

from the complexity point of view: it is preferable to use count sort, which is O(max(M,n)) where M is the maximum value and n is the number of test cases
for small test cases it may be better to use the built-in sorted which is O(nlogn) where n is the number of samples

evaluation

I performed very simple testing on random arrays, showing how (naively implemented) countsort performs as compared to heavily optimised Timsort on bounded data

from timeit import Timer as T
import random

def timsort(x):
    return sorted(x)

def countsort(x):
    m = max(x)
    buckets = [0] * (m+1)
    for a in x: buckets[a] += 1
    return [ b for c in ( [a] * buckets[a] for a in xrange(m+1) ) for b in c ]


for t,n in [ (1000,1000), (1000,10000), (10000,10000), (10000,100000), (100000,100000), (100000,1000000) ]:

    print 't=',t,'n=',n

    X = [ random.randint(0,t) for _ in xrange(n) ]

    t = T( lambda: countsort(X) )
    print 'count',t.timeit(number=1000)

    t = T( lambda: timsort(X) )
    print 'timso',t.timeit(number=1000)

results:

t= 1000 n= 1000
count 0.759298086166
timso 0.296448945999
t= 1000 n= 10000
count 2.71147489548
timso 3.95411610603
t= 10000 n= 10000
count 7.57060909271
timso 4.03612089157
t= 10000 n= 100000
count 28.043751955
timso 59.6779661179
t= 100000 n= 100000
count 78.1641709805
timso 58.4075620174
t= 100000 n= 1000000
count 286.381267071
timso 1023.59388494

So it seems, that if n>10t (where t is a maximum value) then countsort is much faster. Otherwise, better implementation of Timsort outperforms better complexity.

You'll have a hard time beating a hand-tuned Timsort written in C when the alternative is written in Python, even if it has asymptotically better complexity. — , Feb 16 '14 at 13:16
@delnan might be right. The log(n) factors only gets you a lead of 13.8(assuming natural log). Timsort's constant factors might be better. — Guy, Feb 16 '14 at 13:23
I implemented counting sort, Timsort beats it by 0.34 seconds. My implementation is posted as a separate answer. It could be improved upon by some of you guys. — Guy, Feb 16 '14 at 15:47

score 0 · Answer 2 · answered Feb 16 '14 at 13:37

You will find my comment below Iejlot's answer that Timsort beats my implementation of counting sort, although barely. I am submitting it here for anyone to improve upon it.

d={}
newlist=[]
for i in original_List:
      try:
            d[i]+=1
      except:
            d[i]=1
for x,y in d.items():
      newlist+=[x for t in range(y)]

Is there a more efficient way to sort a list than list.sort() in Python?

2 Answers2

theory

evaluation