The Question
Is there a straightforward algorithm for figuring out if a variable is "used" within a given scope?
In a Python AST, I want to remove all assignments to variables that are not otherwise used anywhere, within a given scope.
Details
Motivating example
In the following code, it is obvious to me (a human), that _hy_anon_var_1
is unused, and therefore the _hy_anon_var_1 = None
statements can be removed without changing the result:
# Before
def hailstone_sequence(n: int) -> Iterable[int]:
while n != 1:
if 0 == n % 2:
n //= 2
_hy_anon_var_1 = None
else:
n = 3 * n + 1
_hy_anon_var_1 = None
yield n
# After
def hailstone_sequence(n: int) -> Iterable[int]:
while n != 1:
if 0 == n % 2:
n //= 2
else:
n = 3 * n + 1
yield n
Bonus version
Extend this to []
-lookups with string literals as keys.
In this example, I would expect _hyx_letXUffffX25['x']
to be eliminated as unused, because _hyx_letXUffffX25
is local to h
, so _hyx_letXUffffX25['x']
is essentially the same thing as a local variable. I would then expect _hyx_letXUffffX25
itself to be eliminated once there are no more references to it.
# Before
def h():
_hyx_letXUffffX25 = {}
_hyx_letXUffffX25['x'] = 5
return 3
# After
def h():
return 3
From what I can tell, this is somewhat of an edge case, and I think the basic algorithmic problem is the same.
Definition of "used"
Assume that no dynamic name lookups are used in the code.
A name is used if any of these are true in a given scope:
- It is referenced anywhere in an expression. Examples include: an expression in a
return
statement, an expression on the right-hand side of an assignment statement, a default argument in a function definition, being referenced inside a local function definition, etc. - It is referenced on the left-hand side of an "augmented assignment" statement, i.e. it is an
augtarget
therein. This might represent "useless work" in a lot of programs, but for the purpose of this task that's OK and distinct from being an entirely unused name. - It is
nonlocal
orglobal
. These might be useless nonlocals or globals, but because they reach beyond the given scope, it is OK for my purposes to assume that they are "used".
Please let me know in the comments if this seems incorrect, or if you think I am missing something.
Examples of "used" and "unused"
Example 1: unused
Variable i
in f
is unused:
def f():
i = 0
return 5
Example 2: unused
Variable x
in f
is unused:
def f():
def g(x):
return x/5
x = 10
return g(100)
The name x
does appear in g
, but the variable x
in g
is local to g
. It shadows the variable x
created in f
, but the two x
names are not the same variable.
Variation
If g
has no parameter x
, then x
is in fact used:
def f():
x = 10
def g():
return x/5
return g(100)
Example 3: used
Variable i
in f
is used:
def f():
i = 0
return i
Example 4: used
Variable accum
in silly_map
and silly_sum
is used in both examples:
def silly_map(func, data):
data = iter(data)
accum = []
def _impl():
try:
value = next(data)
except StopIteration:
return accum
else:
accum.append(value)
return _impl()
return _impl()
def silly_any(func, data):
data = iter(data)
accum = False
def _impl():
nonlocal accum, data
try:
value = next(data)
except StopIteration:
return accum
else:
if value:
data = []
accum = True
else:
return _impl()
return _impl()