why python process a sorted list cost more time than a unsorted list

Question

Example:

import cProfile, random, copy
def foo(lIn): return [i*i for i in lIn]
lIn = [random.random() for i in range(1000000)]
lIn1 = copy.copy(lIn)
lIn2 = sorted(lIn1)
cProfile.run('foo(lIn)')
cProfile.run('foo(lIn2)')

Result:

3 function calls in 0.075 seconds

Ordered by: standard name


   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.005    0.005    0.075    0.075 :1()
        1    0.070    0.070    0.070    0.070 test.py:716(foo)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

3 function calls in 0.143 seconds

Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.006    0.006    0.143    0.143 :1()
        1    0.137    0.137    0.137    0.137 test.py:716(foo)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

It doesn't really seem to have anything to do with the sort. You can do `random.shuffle(lIn1)` instead of the sort and `cProfile.run('foo(lIn1)')` and you'll get the same result. — sneep, Mar 21 '18 at 09:01
Maybe the first list is still in cache? And you are using `lIn`, not `lIn1` in the first test call. — Graipher, Mar 21 '18 at 09:34
[why-is-copying-a-shuffled-list-much-slower](https://stackoverflow.com/questions/42107442/why-is-copying-a-shuffled-list-much-slower) — xws, Mar 21 '18 at 09:50
Before the shuffle, when allocated in the heap, the adjacent index objects are adjacent in memory, and the memory hit rate is high when accessed; after shuffle, the object of the adjacent index of the new list is not in memory. Adjacent, the hit rate is very poor. — xws, Mar 21 '18 at 10:08
That sounds like either a good self-answer or a reason to close as duplicate :) — Graipher, Mar 21 '18 at 11:31

score 0 · Answer 1 · answered Mar 21 '18 at 09:46

Not really an answer yet, but the comment margin is a bit too small for this.

As random.shuffle() would yield the same result, I decided to implement my own shuffle function and vary the amount of times I'd shuffle. (In the below example, it's the parameter to xrange, 300000.

def my_shuffle(array):
    for _ in xrange(300000):
        rand1 = random.randint(0, 999999)
        rand2 = random.randint(0, 999999)
        array[rand1], array[rand2] = array[rand2], array[rand1]

The other code is pretty much unmodified:

import cProfile, random, copy
def foo(lIn): return [i*i for i in lIn]
lIn = [random.random()*100000 for i in range(1000000)]
lIn1 = copy.copy(lIn)
my_shuffle(lIn1)
cProfile.run('foo(lIn)')
cProfile.run('foo(lIn1)')

The results I got for the second cProfile depended on the number of times I shuffled:

10000 0.062
100000 0.082
200000 0.099
400000 0.122
800000 0.137
8000000 0.141
10000000 0.141
100000000 0.248

It looks like the more you mess an array up, the longer operations take, up to a certain point. (I don't know about the last result. It took so long that I did some light other stuff in the background and don't really want to retry.)

why python process a sorted list cost more time than a unsorted list

1 Answers1