0

everyone! I am trying to test the creation performance of dictionary with objects, but I get some weird results. I used three different methods to measure the time to create lots of dictionary in Python. The first solution is time module. I know it is not accurate. The test file is "node_time.py"

from __future__ import print_function
from time import time

class node(object):
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.right = None
        self.left = None
        self.parent = None
        self.depth = 0
        return

begin = time()
content = [node(i,i) for i in range(1000000)]
print(time()-begin)

The second method is timeit module. It should be a much better choice. The test file is "node_timeit.py"

from __future__ import print_function
from timeit import repeat

class node(object):
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.right = None
        self.left = None
        self.parent = None
        self.depth = 0
        return

cmd = "content = [node(i,i) for i in range(1000000)]"
prepare = "from __main__ import node"
cost = min(repeat(cmd, prepare, repeat=1, number =1))
print(cost)

The third method is to use the system command "time" in Linux. The test file is "node_sys.py"

from __future__ import print_function

class node(object):
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.right = None
        self.left = None
        self.parent = None
        self.depth = 0
        return

content = [node(i,i) for i in range(1000000)]

Finally the result is quite different.

-bash-4.2$ python2 node_time.py
5.93654894829
-bash-4.2$ python2 node_timeit.py
2.6723048687
-bash-4.2$ time python2 node_sys.py
real    0m8.587s
user    0m7.344s
sys     0m0.716s

The result with time module method (measure the wall-clock time) should be greater than the currect value. But with Linux command "time", the sum of user CPU time and sys CPU time would be as much as 8.060 s. Which result is the correct one? And why they are so much different? Thanks for any comment!

Sheng
  • 3,467
  • 1
  • 17
  • 21
  • They're not exactly equivalent... your `node_timeit` version runs the code 3 times and cherrypicks the best runtime, while the others only get run once... If you're going to compare things, at least compare apples to apples, instead of apples to elephants and whales. – Marc B Feb 15 '14 at 05:07
  • @MarcB Sorry for that. I am updating it. But the min() should not make such big different. Right? – Sheng Feb 15 '14 at 05:09
  • that depends entirely on what exactly this module is doing in the background. perhaps the first run includes the compilation/parsing overhead, so the actual runtimes are 8.x, 2.x, 2.x. Since you cherry pick the "best" value, you cheat the other attempts by not ignoring THEIR setup costs. – Marc B Feb 15 '14 at 05:12
  • @MarcB Thanks for your great suggestion! I forgot that. But after updating, the results still show great differences. Why? – Sheng Feb 15 '14 at 05:16

1 Answers1

4

The difference between the time and timeit timings is because

By default, timeit() temporarily turns off garbage collection during the timing.

When you allocate a lot of memory, normally the cyclic garbage collector will kick in to see if it can reclaim some of that. To get more consistent timings, timeit disables this behavior for the duration of the timing.

Compare the timings with time, with and without garbage collection:

>>> def t1():
...   s = time.time()
...   content = [node(i, i) for i in range(1000000)]
...   print time.time() - s
...
>>> t1()
3.27300000191
>>> gc.disable()
>>> t1()
1.92200016975

to the timings with timeit, with and without garbage collection:

>>> gc.enable()
>>> timeit.timeit('content = [node(i, i) for i in range(1000000)]', 'from __main
__ import node; import gc; gc.enable()', number=1)
3.2806941528164373
>>> timeit.timeit('content = [node(i, i) for i in range(1000000)]', 'from __main
__ import node', number=1)
1.8655694847876134

As you can see, both methods produce the same timing with the same GC settings.

As for the command line time command, that includes the entire runtime of the program, including interpreter setup and teardown and other parts the other timings don't include. I suspect one of the big contributors to the difference is the time taken to free all the node objects you allocated:

>>> def t2():
...   s = time.time()
...   [node(i, i) for i in range(1000000)]
...   # List and contents are deallocated
...   print time.time() - s
...
>>> t2()
3.96099996567
user2357112
  • 260,549
  • 28
  • 431
  • 505
  • Thanks for your answer! If I disable the gc before allocating a lot of memory, and enable it immediately after allocation, will it improve the total performance? Or will it lead to a performance disaster after enabling gc? My simple test shown the improvement. But I am not sure, because I am unfamiliar with gc. Thanks! – Sheng Feb 16 '14 at 04:27
  • @Sheng: It can help, but it isn't guaranteed to, and it can hurt if you're creating reference cycles during that allocation. – user2357112 Feb 16 '14 at 05:51
  • So without creating reference cycles, it will help SOMETIMES, but NEVER make the performance worse, right? – Sheng Feb 16 '14 at 05:58
  • @Sheng: It might mess with the collection scheduling or something. I'm not completely sure. If you need to do it, profile first. If you don't need the performance boost, don't bother. It's an extra complication to keep track of. – user2357112 Feb 16 '14 at 06:00