1

I have two python functions for sorting a list of integers contained as strings. They are follows:

import random

n = 10000000
unsorted = [str(x) for x in range(n)]
random.shuffle(unsorted)

def sort_alg1(unsorted):
    unsorted.sort(key = lambda x: int(x))
    return unsorted

def sort_alg2(unsorted):
    l=list(map(int,unsorted))
    l.sort()
    s=list(map(str,l))
    return s

print(sort_alg1(unsorted))
print(sort_alg2(unsorted))

Both work as expected. However, according to my profiler (I am using the ever-popular line_profiler by Robert Kern), the first function i.e. sort_alg1 is more than 3 times slower in execution as sort_alg2. Now that wouldn't be a big problem if I could pinpoint the reason for this, but I am unable to. I have tried looking up differences in the built-in sort() method and map() function, lambda, etc. all to no avail. If someone could educate me as to why this is happening, that would be really great!

Prithvidiamond
  • 327
  • 2
  • 15

2 Answers2

1

Doing some benchmark:

from timeit import timeit
import random

n = 1_000_000
unsorted = [str(x) for x in range(n)]
random.shuffle(unsorted)


def sort_alg1(unsorted):
    unsorted.sort(key=lambda x: int(x))
    return unsorted


def sort_alg2(unsorted):
    l = list(map(int, unsorted))
    l.sort()
    s = list(map(str, l))
    return s


t1 = timeit(lambda: sort_alg1(unsorted), number=1)
t2 = timeit(lambda: sort_alg2(unsorted), number=1)

print(t1)
print(t2)

Prints:

0.4568827738985419
0.2486396620515734

So it seems that sort_alg2 is faster. But the reason is that sort_alg2 receives already sorted array from sort_alg1. If you slightly change the benchmark:

t1 = timeit(lambda: sort_alg1(unsorted), number=1)
random.shuffle(unsorted)                      # <---- shufle again the array
t2 = timeit(lambda: sort_alg2(unsorted), number=1)

print(t1)
print(t2)

Prints:

0.456114097032696
0.5958397497888654

So first function is faster.

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • 1
    I need to stop coding at 2 in the night, this is the 10th stupid mistake in a week that I have made coding late in the night cause I am deprived of sleep, and I can't code any other time cause I have school. Somebody send me help! Thanks by the way for helping me! One thing that is still bothering me though is why is the first one faster. Is it cause creating a new list each time using the map function is too expensive compared to sorting in place using the key argument? – Prithvidiamond Jul 09 '21 at 17:03
  • Also I made a deepcopy of the original shuffled list and ran each of them through one of the sorting functions and I still get the second one to be faster than the first function, though the margins much slimmer but I get these results consistently (happened everytime over 6 runs) – Prithvidiamond Jul 09 '21 at 17:33
0

unsorted.sort in function sort_alg1 sorts the list in place so that sort_alg2 is not starting out with quite the same version of the unsorted list. Moreover, once sort_alg2 executes the statement l = list(map(int, unsorted)), l is now a completely sorted list if we are sorting by integer values. So l.sort() is going to run in trivial time.

Note the assert statement in the sort_alg2 function:

import random

n = 10000000
unsorted = [str(x) for x in range(n)]
random.shuffle(unsorted)

def sort_alg1(unsorted):
    unsorted.sort(key = lambda x: int(x))
    return unsorted

def sort_alg2(unsorted):
    l=list(map(int,unsorted))
    # make a copy of l:
    l2 = l.copy()
    l.sort()
    assert(l == l2) # proof!
    s=list(map(str,l))
    return s

sort_alg1(unsorted)
sort_alg2(unsorted)
Booboo
  • 38,656
  • 3
  • 37
  • 60