I am trying to write a count sort in python to beat the built-in timsort in certain situations. Right now it beats the built in sorted function, but only for very large arrays (1 million integers in length and longer, I haven't tried over 10 million) and only for a range no larger than 10,000. Additionally, the victory is narrow, with count sort only winning by a significant margin in random lists specifically tailored to it.
I have read about astounding performance gains that can be gained from vectorizing python code, but I don't particularly understand how to do it or how it could be used here. I would like to know how I can vectorize this code to speed it up, and any other performance suggestions are welcome.
Current fastest version for just python and stdlibs:
from itertools import chain, repeat
def untimed_countsort(unsorted_list):
counts = {}
for num in unsorted_list:
try:
counts[num] += 1
except KeyError:
counts[num] = 1
sorted_list = list(
chain.from_iterable(
repeat(num, counts[num])
for num in xrange(min(counts), max(counts) + 1)))
return sorted_list
All that counts is raw speed here, so sacrificing even more space for speed gains is completely fair game.
I realize the code is fairly short and clear already, so I don't know how much room there is for improvement in speed.
If anyone has a change to the code to make it shorter, as long as it doesn't make it slower, that would be awesome as well.
Execution time is down almost 80%! Now three times as fast as Timsort on my current tests!
The absolute fastest way to do this by a LONG shot is using this one-liner with numpy:
def np_sort(unsorted_np_array):
return numpy.repeat(numpy.arange(1+unsorted_np_array.max()), numpy.bincount(unsorted_np_array))
This runs about 10-15 times faster than the pure python version, and about 40 times faster than Timsort. It takes a numpy array in and outputs a numpy array.