4

The Question

Is there a straightforward algorithm for figuring out if a variable is "used" within a given scope?

In a Python AST, I want to remove all assignments to variables that are not otherwise used anywhere, within a given scope.


Details

Motivating example

In the following code, it is obvious to me (a human), that _hy_anon_var_1 is unused, and therefore the _hy_anon_var_1 = None statements can be removed without changing the result:

# Before
def hailstone_sequence(n: int) -> Iterable[int]:
    while n != 1:
        if 0 == n % 2:
            n //= 2
            _hy_anon_var_1 = None
        else:
            n = 3 * n + 1
            _hy_anon_var_1 = None
        yield n

# After
def hailstone_sequence(n: int) -> Iterable[int]:
    while n != 1:
        if 0 == n % 2:
            n //= 2
        else:
            n = 3 * n + 1
        yield n

Bonus version

Extend this to []-lookups with string literals as keys.

In this example, I would expect _hyx_letXUffffX25['x'] to be eliminated as unused, because _hyx_letXUffffX25 is local to h, so _hyx_letXUffffX25['x'] is essentially the same thing as a local variable. I would then expect _hyx_letXUffffX25 itself to be eliminated once there are no more references to it.

# Before
def h():
    _hyx_letXUffffX25 = {}
    _hyx_letXUffffX25['x'] = 5
    return 3

# After
def h():
    return 3

From what I can tell, this is somewhat of an edge case, and I think the basic algorithmic problem is the same.

Definition of "used"

Assume that no dynamic name lookups are used in the code.

A name is used if any of these are true in a given scope:

  1. It is referenced anywhere in an expression. Examples include: an expression in a return statement, an expression on the right-hand side of an assignment statement, a default argument in a function definition, being referenced inside a local function definition, etc.
  2. It is referenced on the left-hand side of an "augmented assignment" statement, i.e. it is an augtarget therein. This might represent "useless work" in a lot of programs, but for the purpose of this task that's OK and distinct from being an entirely unused name.
  3. It is nonlocal or global. These might be useless nonlocals or globals, but because they reach beyond the given scope, it is OK for my purposes to assume that they are "used".

Please let me know in the comments if this seems incorrect, or if you think I am missing something.

Examples of "used" and "unused"

Example 1: unused

Variable i in f is unused:

def f():
    i = 0
    return 5

Example 2: unused

Variable x in f is unused:

def f():
    def g(x):
        return x/5
    x = 10
    return g(100)

The name x does appear in g, but the variable x in g is local to g. It shadows the variable x created in f, but the two x names are not the same variable.

Variation

If g has no parameter x, then x is in fact used:

def f():
    x = 10
    def g():
        return x/5
    return g(100)

Example 3: used

Variable i in f is used:

def f():
    i = 0
    return i

Example 4: used

Variable accum in silly_map and silly_sum is used in both examples:

def silly_map(func, data):
    data = iter(data)
    accum = []

    def _impl():
        try:
            value = next(data)
        except StopIteration:
            return accum
        else:
            accum.append(value)
            return _impl()

    return _impl()
def silly_any(func, data):
    data = iter(data)
    accum = False

    def _impl():
        nonlocal accum, data
        try:
            value = next(data)
        except StopIteration:
            return accum
        else:
            if value:
                data = []
                accum = True
            else:
                return _impl()

    return _impl()
shadowtalker
  • 12,529
  • 3
  • 53
  • 96
  • 3
    Well, flake8/pyflakes can do it (search for `UnusedVariable`)... https://github.com/PyCQA/pyflakes/blob/master/pyflakes/checker.py – AKX Aug 10 '21 at 14:06
  • 1
    Also, the PyCharm IDE happily also notes the un-uses in your examples 1 and 2. – AKX Aug 10 '21 at 14:08
  • 2
    Oh, and `autoflake` also has `--remove-unused-variables`. https://github.com/myint/autoflake – AKX Aug 10 '21 at 14:10
  • @AKX excellent, thank you! If I figure out a standalone solution I will post it here. – shadowtalker Aug 10 '21 at 14:22

1 Answers1

4

The solution below works in two parts. First, the syntax tree of the source is traversed and all unused target assignment statements are discovered. Second, the tree is traversed again via a custom ast.NodeTransformer class, which removes these offending assignment statements. The process is repeated until all unused assignment statements are removed. Once this is finished, the final source is written out.

The ast traverser class:

import ast, itertools, collections as cl
class AssgnCheck:
   def __init__(self, scopes = None):
      self.scopes = scopes or cl.defaultdict(list)
   @classmethod
   def eq_ast(cls, a1, a2):
      #check that two `ast`s are the same
      if type(a1) != type(a2):
         return False
      if isinstance(a1, list):
         return all(cls.eq_ast(*i) for i in itertools.zip_longest(a1, a2))
      if not isinstance(a1, ast.AST):
         return a1 == a2
      return all(cls.eq_ast(getattr(a1, i, None), getattr(a2, i, None)) 
                 for i in set(a1._fields)|set(a2._fields) if i != 'ctx')
   def check_exist(self, t_ast, s_path):
      #traverse the scope stack and remove scope assignments that are discovered in the `ast`
      s_scopes = []
      for _ast in t_ast:
         for sid in s_path[::-1]:
            s_scopes.extend(found:=[b for _, b in self.scopes[sid] if AssgnCheck.eq_ast(_ast, b) and \
                all(not AssgnCheck.eq_ast(j, b) for j in s_scopes)])
            self.scopes[sid] = [(a, b) for a, b in self.scopes[sid] if b not in found]
   def traverse(self, _ast, s_path = [1]):
      #walk the ast object itself
      _t_ast = None
      if isinstance(_ast, ast.Assign): #if assignment statement, add ast object to current scope
         self.traverse(_ast.targets[0], s_path)
         self.scopes[s_path[-1]].append((True, _ast.targets[0]))
         _ast = _ast.value
      if isinstance(_ast, (ast.ClassDef, ast.FunctionDef, ast.AsyncFunctionDef)):
         s_path = [*s_path, (nid:=(1 if not self.scopes else max(self.scopes)+1))]
         if isinstance(_ast, (ast.FunctionDef, ast.AsyncFunctionDef)):
            self.scopes[nid].extend([(False, ast.Name(i.arg)) for i in _ast.args.args])
            _t_ast = [*_ast.args.defaults, *_ast.body]
      self.check_exist(_t_ast if _t_ast is not None else [_ast], s_path) #determine if any assignment statement targets have previously defined names
      if _t_ast is None:
         for _b in _ast._fields:
            if isinstance((b:=getattr(_ast, _b)), list):
               for i in b:
                  self.traverse(i, s_path)
            elif isinstance(b, ast.AST):
               self.traverse(b, s_path)
      else:
          for _ast in _t_ast:
             self.traverse(_ast, s_path)
         

Putting it all together:

class Visit(ast.NodeTransformer):
   def __init__(self, asgn):
       super().__init__()
       self.asgn = asgn
   def visit_Assign(self, node):
       #remove assignment nodes marked as unused
       if any(node.targets[0] == i for i in self.asgn):
          return None
       return node

def remove_assgn(f_name):
  tree = ast.parse(open(f_name).read())
  while True:
     r = AssgnCheck()
     r.traverse(tree)
     if not (k:=[j for b in r.scopes.values() for k, j in b if k]):
        break
     v = Visit(k)
     tree = v.visit(tree)
  return ast.unparse(tree)

print(remove_assgn('test_name_assign.py'))

Output Samples

Contents of test_name_assign.py:

def hailstone_sequence(n: int) -> Iterable[int]:
    while n != 1:
        if 0 == n % 2:
            n //= 2
            _hy_anon_var_1 = None
        else:
            n = 3 * n + 1
            _hy_anon_var_1 = None
        yield n

Output:

def hailstone_sequence(n: int) -> Iterable[int]:
    while n != 1:
        if 0 == n % 2:
            n //= 2
        else:
            n = 3 * n + 1
        yield n

Contents of test_name_assign.py:

def h():
    _hyx_letXUffffX25 = {}
    _hyx_letXUffffX25['x'] = 5
    return 3

Output:

def h():
    return 3

Contents of test_name_assign.py:

def f():
    i = 0
    return 5

Output:

def f():
    return 5

Contents of test_name_assign.py:

def f():
    x = 10
    def g():
        return x/5
    return g(100)

Ouptut:

def f():
    x = 10
    def g():
        return x / 5
    return g(100)
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
  • Thanks! How does this work without explicitly keeping a stack of lexical scopes as you traverse the AST? That's what Pyflakes appears to do. The `AssgnCheck` implementation is somewhat dense. – shadowtalker Aug 10 '21 at 21:27
  • 1
    @shadowtalker My current solution is quite basic with regards to scoping: `asgn` tracks all assignments throughout the entire code for use later on. I can update my solution to have stricter scoping rules, however, I could use a bit of clarification first. In your example 2, suppose `g` did not have a parameter `x`, and that `x = 10` was declared before `g`, rather than after. Would `x` still be "unused", since it is in the scope of `f`, or would it be used, become if the example were to be run, `x/5` would access the `x` from the scope of `f`? – Ajax1234 Aug 10 '21 at 21:54
  • It would be the same if `x = 10` is before or after the definition of `g`, at least in CPython. In example 2, if `g` has no parameter `x`, then `x` is used, because i appears in an expression. The reason it's not used in the example is that it's shadowed by the parameter `x` in the definition of `g`. – shadowtalker Aug 11 '21 at 01:21
  • 1
    @shadowtalker Please see my recent edit, I greatly expanded my earlier answer to include scope handling and some comments. – Ajax1234 Aug 11 '21 at 03:15
  • Wow! I'll have to spend some time understanding what you did. – shadowtalker Aug 11 '21 at 06:27