6

I have an application where I need to build a list or a dictionary and speed is important. Normally I would just declare a list of zeros of the appropriate length and assign values one at a time but I need to be able to check the length and have it still be meaningful.

Would it be faster to add a key value pair to a dictionary or to append a value to a list? The length of the lists and dictionary will usually be small (less than 100) but this isn't always true and in worst case could be much larger.

I could also just have a variable to keep track of where I am in the list if both of these operations are too slow.

thebjorn
  • 26,297
  • 11
  • 96
  • 138
MattTheSnake
  • 107
  • 1
  • 7
  • 1
    My guess is that appending to a list will be faster, since you don't need the extra `key` object..., but as with all performance queries you'll have to measure against your data. – thebjorn Sep 22 '16 at 14:19
  • 1
    Measure it with the `timeit` module. https://docs.python.org/2/library/timeit.html – Klaus D. Sep 22 '16 at 14:22
  • ... Note that `list` operations are designed to be amortized O(1). I.e. by pre-allocating the list you aren't really saving that much time, just a fraction of it. Dicts require hashing, so keep in mind that the more complex the object is the more time it will take to hash, and thus `dict` will become slower and slower, why `list` doesn't care. Also `dict`s lookup are significantly slower (again: you need to first hash the key). BTW: you know that `set` is *exactly* just a `dict` with no values? So if you want a hashing solution use `set` and avoid setting fake values. – Bakuriu Sep 22 '16 at 14:28

2 Answers2

3

Best way is to use time() to check your execution time.

In following example dict is slightly faster.

from time import time

st_time = time()
b = dict()
for i in range(1, 10000000):
    b[i] = i

print (time() - st_time)

st_time = time()
a = []
for i in range(1, 10000000):
    a.append(i)

print (time() - st_time)

1.45600008965
1.52499985695
saurabh baid
  • 1,819
  • 1
  • 14
  • 26
  • 1
    On my system properly using `timeit` instead of using `time` explicitly `list.append` is slightly faster. Also note that you could do: `a = list(range(1, 1000000))` which significantly reduces the timings. – Bakuriu Sep 22 '16 at 14:25
  • why aren't you using the `timeit` module (which is created for exactly these kinds of micro-benchmarks). Also, your data isn't similar to what the OP describes.. – thebjorn Sep 22 '16 at 14:25
  • You should also always test performance code inside a function (I'm guessing global variable lookup has a significant impact on your results). – thebjorn Sep 22 '16 at 14:27
  • in `timeit` I was not able to pass expression timeit(b['a']=1, 100000) SyntaxError: keyword can't be an expression – saurabh baid Sep 22 '16 at 14:28
  • I am just trying to give idea., he can apply the logic against his code. – saurabh baid Sep 22 '16 at 14:30
0

Another option is a deque: purpose built for fast append and pop (especially popleft).

Deques are a generalization of stacks and queues (the name is pronounced “deck” and is short for “double-ended queue”). Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction.

Caveat: read access is slower than list or dict if you're trying to access items in the middle.

However, I was surprised that for adding new items, dict was at least as fast:

python -m timeit -s "from collections import deque; d = deque()" "for i in range(10000000):" " d.append(i)"
1 loop, best of 5: 459 msec per loop

python -m timeit -s "l = list()" "for i in range(10000000):" " l.append(i)"
1 loop, best of 5: 517 msec per loop

python -m timeit -s "d = dict()" "for i in range(10000000):" " d[i] = i"
1 loop, best of 5: 450 msec per loop
fantabolous
  • 21,470
  • 7
  • 54
  • 51