-1

I have a function which needs to execute inside a fairly tight loop so it's performance sensitive. Its a filtering function intended to save more expensive work. Most of the function is just a check against a static list.

So (stripping out some irrelevant detail) I could do two different things here:

 def my_filter(arg):
    SAFE_VALS = {
        'a',
        'b',
        'f',
        'i',
        'm'
    }
   # the real list is much larger, but 
   # it is a static set like the above...

    return arg in SAFE_VALS

or

 # make this a module level constant
 SAFE_VALS = {
        'a',
        'b',
        'f',
        'i',
        'm'
    }

 def my_filter(arg):
    return arg in SAFE_VALS

I know that Python will have to look one scope level higher to find the module level version -- but what I don't know is whether the compiled version of my_filter effectively recreates this set every time the function is run or if the literal is "baked in" to the function def. If every invocation allocates a new set, it feels like I'd be better off with the module level version -- if not, I don't gain anything by hoisting the literal out of the function scope.

Right now the profiling data is noisy enough that I don't see a clear difference. But if that set includes a much larger number of longish strings will I? Or is there just now significant difference between these forms?

theodox
  • 12,028
  • 3
  • 23
  • 36

1 Answers1

6

As a matter of language semantics, {'a', 'b', 'f', 'i', 'm'} means "make a new set with these contents". The language semantics allow reusing an existing object for immutable types, but sets are mutable.

As an implementation detail, CPython currently optimizes expressions of the form x in {stuff}, where the right-hand side is a set expression containing only constants. In that case, Python will precompute a frozenset and save it in the code object's co_consts. Saving the set to a variable prevents the optimization.

To guarantee the set is not recomputed, independent of implementation details, you should precompute the set yourself and save SAFE_VALS as a global variable.

user2357112
  • 260,549
  • 28
  • 431
  • 505