Running time of in dict compare to ==

Question

In an exam, I was asked which of the following function is most likely to run faster:

def f1():
    a = []
    for j in range(100000): a.append(j*j)
    for j in range(100000):
        if 99999*j == a[j]:
            print("yes")

def f3():
    d = {}
    for j in range(100000): d[j] = j*j
    for j in range(100000):
        if 99999*j in d:
            print("yes")

It's obvious that both in and == are O(1).

But still, I was surprise to see that the f1 runs faster.

Any explanation will be appreciated.

The two pieces of code do different things altogether. – Martijn Pieters May 16 '16 at 10:17 — Martijn Pieters, May 16 '16 at 10:17
Please do not vandalise your posts. – AdrianHHH May 17 '17 at 21:01 — AdrianHHH, May 17 '17 at 21:01

Martijn Pieters · Answer 1 · 2017-05-19T02:54:14.123

The constant costs are different. Just because two algorithms are O(1) does not make them equivalent. For one, dictionaries are only O(1) on average, while list lookup is O(1), always.

You need analyse the differences, but you are making a wrong assumption here. f1 looks up an element by index in list, then tests for equality. The f2 tests an element in a dictionary. However, a dictionary membership test which involves hashing and a test for equality (if there is an object at that location).

So the real difference here is that of hashing versus a list lookup. And the list lookup wins as its cost is still O(1), but hashing can be O(N), based on the size of the object being hashed. The difference explains the timings you see:

>>> import timeit
>>> a = [j * j for j in range(100000)]
>>> timeit.timeit('a[5000]', 'from __main__ import a', number=10**7)
0.26793562099919654
>>> timeit.timeit('_h(5000)', '_h = hash', number=10**7)
0.4080043680005474

The actual cost of hashing is normally averaged out over all dictionary membership lookups; together with the possible worst-case scenario of O(N) lookups for dictionaries, that makes dictionary lookups only average O(1). List lookups on the other hand are always O(1).

score 0 · Answer 2 · edited May 23 '17 at 11:33

0

It's obvious that both in and == are O(1).

‒ No, it's not.

Complexity of in for dicts and sets is average O(1) and worst (with a lot of collisions) up to O(N). For better understanding of how does hash tables, dicts, sets and so on work, please read this - How are Python's Built In Dictionaries Implemented

edited May 23 '17 at 11:33

Community

1
1

answered May 16 '16 at 10:26

mikebutrimov

365
1
10

Running time of in dict compare to ==

2 Answers2