0

I have been reading up for a few hours trying to understand membership testing and speeds as I fell down that rabbit hole. I thought I had gotten it until I ran my own little timeit test

Here's the code

range_ = range(20, -1, -1)
w = timeit.timeit('0 in {seq}'.format(seq=list(range_)))
x = timeit.timeit('0 in {seq}'.format(seq=tuple(range_)))
y = timeit.timeit('0 in {seq}'.format(seq=set(range_)))
z = timeit.timeit('0 in {seq}'.format(seq=frozenset(range_)))
print('list:', w)
print('tuple:', x)
print('set:', y)
print('frozenset:', z)

and here is the result

list: 0.3762843

tuple: 0.38087859999999996

set: 0.06568490000000005

frozenset: 1.5114070000000002

List and tuple having the same time makes sense. I thought set and frozenset would have the same time as well but it is extremey slow even compared to lists?

Changing the code to the following gives me similar results still:

list_ = list(range(20, -1, -1))
tuple_ = tuple(range(20, -1, -1))
set_ = set(range(20, -1, -1))
frozenset_ = frozenset(range(20, -1, -1))

w = timeit.timeit('0 in {seq}'.format(seq=list_))
x = timeit.timeit('0 in {seq}'.format(seq=tuple_))
y = timeit.timeit('0 in {seq}'.format(seq=set_))
z = timeit.timeit('0 in {seq}'.format(seq=frozenset_))
Community
  • 1
  • 1
user1021085
  • 729
  • 3
  • 10
  • 28
  • 2
    I'd guess because there's no literal form. What happens if you test *without* the creation? – jonrsharpe Sep 10 '19 at 20:29
  • @jonrrsharpe Sorry what do you mean without the creation? – user1021085 Sep 10 '19 at 20:31
  • You're timing how long it takes to convert the range to each of the object types AND look up the value, not just timing the lookup – G. Anderson Sep 10 '19 at 20:31
  • 2
    So instead of creating a new frozenset each time (which also creates a set; again, there's no literal form), *just* include the membership test in the loop. – jonrsharpe Sep 10 '19 at 20:32
  • Cannot reproduce: on my machine (with Python 2.7) `set` and `frozenset` are about on par with each other. – NPE Sep 10 '19 at 20:34
  • @NPE I'm using Python 3.7, if that helps make sense of it? – user1021085 Sep 10 '19 at 20:37
  • @G.Anderson I added new code. That only times the memberhsip check, right? The results are the same though. (Or very similar, but each run gives slightly different results. 0.38 to 0.375 etc) – user1021085 Sep 10 '19 at 20:42
  • 1
    Your new code still suffers the same problem as the original one, because for the frozenset case, `'0 in {seq}'.format(seq=frozenset_)` gives `'0 in frozenset({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20})'`, and so just as @jonrsharpe said, you're building a set, then making a frozenset, and only _then_ testing membership. – DSM Sep 10 '19 at 20:42
  • Try `timeit('0 in thing', setup='thing = {seq}'.format(seq=frozenset_))` – jonrsharpe Sep 10 '19 at 20:44
  • @jonrsharpe Your code resulted in 0.0663 so that is much closer to the set time. But since I wrote the code for set the same way why didn't it also get a long time such as 1.5? Only frozenset was affected – user1021085 Sep 10 '19 at 20:48
  • Because the others are literals, the frozenset test had to build a set and then call a function (well, initialise a class). – jonrsharpe Sep 10 '19 at 20:49
  • @jonrsharpe excuse me if I'm slow at getting this but google tells me there are no set literals, I don't quite understand why frozenset does all these things that set does not. Tried to google answers. – user1021085 Sep 10 '19 at 21:14
  • I doubt Google tells you that, given e.g. `{1, 2, 3}` (since Python 2.7 and 3.1). There's no *empty* set literal, as `{}` is a dictionary. Just look at the strings you're building, `{0, 1, 2, ...}` vs. `frozenset({0, 1, 2, ...})`. – jonrsharpe Sep 10 '19 at 21:18
  • I was referring to this http://buildingskills.itmaybeahack.com/book/python-2.6/html/p02/p02c06_sets.html – user1021085 Sep 10 '19 at 21:19
  • Why are you learning Python from a book aimed at Python 2.6? The *most recent* version of Python 2.6 is [almost six years old](https://www.python.org/downloads/release/python-269/), and Python 2 support generally [ends in a few months](https://pythonclock.org/). See https://sopython.com/wiki/What_tutorial_should_I_read%3F. – jonrsharpe Sep 10 '19 at 21:22
  • Ah sorry I didn't realize that. I was just googling things about sets, I am not using the book. I've watched some free MIT course using Python 3 and then googling things as I need them or come across them (as I stumbled across some thread about membership tests) – user1021085 Sep 10 '19 at 21:23

2 Answers2

2

It's not the membership test, it's the construction that's taking the time.

Consider the following:

import timeit

list_ = list(range(20, -1, -1))
tuple_ = tuple(range(20, -1, -1))
set_ = set(range(20, -1, -1))
frozenset_ = frozenset(range(20, -1, -1))

w = timeit.timeit('0 in list_', globals=globals())
x = timeit.timeit('0 in tuple_', globals=globals())
y = timeit.timeit('0 in set_', globals=globals())
z = timeit.timeit('0 in frozenset_', globals=globals())

print('list:', w)
print('tuple:', x)
print('set:', y)
print('frozenset:', z)

I get the following timings with Python 3.5:

list: 0.28041897085495293
tuple: 0.2775509520433843
set: 0.0552431708201766
frozenset: 0.05547476885840297

The following will demonstrate why frozenset is so much slower by disassembling the code you're benchmarking:

import dis

def print_dis(code):
  print('{code}:'.format(code=code))
  dis.dis(code)

range_ = range(20, -1, -1)
print_dis('0 in {seq}'.format(seq=list(range_)))
print_dis('0 in {seq}'.format(seq=tuple(range_)))
print_dis('0 in {seq}'.format(seq=set(range_)))
print_dis('0 in {seq}'.format(seq=frozenset(range_)))

Its output is pretty self-explanatory:

0 in [20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]:
  1           0 LOAD_CONST               0 (0)
              3 LOAD_CONST              21 ((20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0))
              6 COMPARE_OP               6 (in)
              9 RETURN_VALUE
0 in (20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0):
  1           0 LOAD_CONST               0 (0)
              3 LOAD_CONST              21 ((20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0))
              6 COMPARE_OP               6 (in)
              9 RETURN_VALUE
0 in {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}:
  1           0 LOAD_CONST               0 (0)
              3 LOAD_CONST              21 (frozenset({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}))
              6 COMPARE_OP               6 (in)
              9 RETURN_VALUE
0 in frozenset({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20}):
  1           0 LOAD_CONST               0 (0)
              3 LOAD_NAME                0 (frozenset)
              6 LOAD_CONST               0 (0)
              9 LOAD_CONST               1 (1)
             12 LOAD_CONST               2 (2)
             15 LOAD_CONST               3 (3)
             18 LOAD_CONST               4 (4)
             21 LOAD_CONST               5 (5)
             24 LOAD_CONST               6 (6)
             27 LOAD_CONST               7 (7)
             30 LOAD_CONST               8 (8)
             33 LOAD_CONST               9 (9)
             36 LOAD_CONST              10 (10)
             39 LOAD_CONST              11 (11)
             42 LOAD_CONST              12 (12)
             45 LOAD_CONST              13 (13)
             48 LOAD_CONST              14 (14)
             51 LOAD_CONST              15 (15)
             54 LOAD_CONST              16 (16)
             57 LOAD_CONST              17 (17)
             60 LOAD_CONST              18 (18)
             63 LOAD_CONST              19 (19)
             66 LOAD_CONST              20 (20)
             69 BUILD_SET               21
             72 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             75 COMPARE_OP               6 (in)
             78 RETURN_VALUE
NPE
  • 486,780
  • 108
  • 951
  • 1,012
2

This is because among the 4 data types you converted the range object into, frozenset is the only data type in Python 3 that requires a name lookup in its literal form, and name lookups are expensive because it requires hashing the string of the name and then looking it up through local, global and then built-in namespaces:

>>> repr(list(range(3)))
'[0, 1, 2]'
>>> repr(tuple(range(3)))
'(0, 1, 2)'
>>> repr(set(range(3)))
'{0, 1, 2}'
>>> repr(frozenset(range(3)))
'frozenset([0, 1, 2])' # requires a name lookup when evaluated by timeit

In Python 2, sets also require a name lookup when converted by repr, which is why @NPE reported in the comment that there is little difference in performance between a frozenset and a set in Python 2:

>>> repr(set(range(3)))
'set([0, 1, 2])'
blhsing
  • 91,368
  • 6
  • 71
  • 106
  • How come it does a name lookup? And by name lookup you mean it, what, looks for "frozenset" in the method/loop (if it is in one) then the file then built-in things? – user1021085 Sep 10 '19 at 21:16
  • @user1021085 to *lookup* the *name* `frozenset` – jonrsharpe Sep 10 '19 at 21:18
  • Because `timeit.timeit` calls `exec` to compile and execute the given string as a Python statement/expression, which has to look up the name `frozenset` to resolve it as a function object. – blhsing Sep 10 '19 at 21:19
  • @blhsing but it doesn't need to do this for set (in python 3) or list or tuple? Is this because they are more 'integrated' for lack of a better way to say it in Python itself? – user1021085 Sep 10 '19 at 21:21
  • It doesn't need to do this for list or tuple because you're formatting the list or tuple with the `format` method first, which turns the list or tuple object into a string by calling `repr` on it, which produces the list or tuple in its literal representation with no name for `exec` to resolve. – blhsing Sep 10 '19 at 21:23
  • so (), [] and for set {}, but there's no similar literal thing for frozenset so it needs to look it up? For example printing a tuple gives me `('a', 'b', 'c')` while printing a frozenset gives me `frozenset({'c', 'a', 'b'})` – user1021085 Sep 10 '19 at 21:29
  • Yes, that's exactly the point. Evaluating `frozenset({'c', 'a', 'b'})` is actually calling `frozenset` as a function, and passing a set as an argument to it, while evaluating `('a', 'b', 'c')` is simply loading the tuple from the constant table directly. – blhsing Sep 10 '19 at 22:11
  • @blhsing my only question would be why doesn't frozenset has its own.. I don't know what to call it, literal characters? – user1021085 Sep 10 '19 at 22:18
  • 1
    I guess it has to do with the available symbols to choose from. There are only so many symbols after all, and only `{}`, `()`, `[]` and `<>` can reasonably be used as enclosures, but since all the former 3 are taken, and `<` and `>` are already used as comparison operators where `1<2>0` is making a valid chained comparison, it leaves no good option for a `frozenset` literal to use another pair of symbols as enclosure without making the code look ugly, confusing or ambiguous. – blhsing Sep 10 '19 at 22:30
  • One last question I promise heh, but I was thinking of when you use a set, it can "peephole optimize(?)" and change it into a frozenset. Does all the name lookup happen behind the scenes then? – user1021085 Sep 10 '19 at 22:35
  • 1
    When you use a set literal with no names within, it is indeed optimized as a `frozenset` so that it can be loaded as a constant directly, but that optimization is done at the compilation time so there is no penalty during runtime. Your `frozenset` on the other hand gets converted into a function call to `frozenset` at runtime so there is the overhead of a name lookup and a function call. – blhsing Sep 10 '19 at 22:45