25

tldr; see the final line; the rest is just preamble.


I am developing a test harness, which parses user scripts and generates a Python script which it then runs. The idea is for non-techie folks to be able to write high-level test scripts.

I have introduced the idea of variables, so a user can use the LET keyword in his script. E.g. LET X = 42, which I simply expand to X = 42. They can then use X later in their scripts - RELEASE CONNECTION X

But what if someone writes LET 2 = 3? That's going to generate invalid Python.

If I have that X in a variable variableName, then how can I check whether variableName is a valid Python variable?

Community
  • 1
  • 1
Mawg says reinstate Monica
  • 38,334
  • 103
  • 306
  • 551
  • 9
    On the side: Why do you think "LET X = 42" is easier for "non-techie folks" than "X = 42"? – timgeb Mar 31 '16 at 10:36
  • One option is to use a regex. See [Regular expression to confirm whether a string is a valid identifier in Python](http://stackoverflow.com/q/5474008/4014959) – PM 2Ring Mar 31 '16 at 10:41
  • @PM2Ring - Note that that's for Python 2. It's less simple for [Python 3](https://docs.python.org/3.5/reference/lexical_analysis.html#identifiers) (also see [here](https://www.python.org/dev/peps/pep-3131/) and [here](http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html). – TigerhawkT3 Mar 31 '16 at 10:44
  • @timgeb the answer to that is quite Basic :-) – Mawg says reinstate Monica Jan 03 '18 at 11:14

6 Answers6

59

In Python 3 you can use str.isidentifier() to test whether a given string is a valid Python identifier/name.

>>> 'X'.isidentifier()
True
>>> 'X123'.isidentifier()
True
>>> '2'.isidentifier()
False
>>> 'while'.isidentifier()
True

The last example shows that you should also check whether the variable name clashes with a Python keyword:

>>> from keyword import iskeyword
>>> iskeyword('X')
False
>>> iskeyword('while')
True

So you could put that together in a function:

from keyword import iskeyword

def is_valid_variable_name(name):
    return name.isidentifier() and not iskeyword(name)

Another option, which works in Python 2 and 3, is to use the ast module:

from ast import parse

def is_valid_variable_name(name):
    try:
        parse('{} = None'.format(name))
        return True
    except SyntaxError, ValueError, TypeError:
        return False

>>> is_valid_variable_name('X')
True
>>> is_valid_variable_name('123')
False
>>> is_valid_variable_name('for')
False
>>> is_valid_variable_name('')
False
>>> is_valid_variable_name(42)
False

This will parse the assignment statement without actually executing it. It will pick up invalid identifiers as well as attempts to assign to a keyword. In the above code None is an arbitrary value to assign to the given name - it could be any valid expression for the RHS.

mhawke
  • 84,695
  • 9
  • 117
  • 138
  • 1
    `compile('{} = None'.format(name), "", "exec")` and `return True` after should be enough – Padraic Cunningham Mar 31 '16 at 11:02
  • 1
    @PadraicCunningham: Thanks. Either works and `ast.parse()` calls `compile()` anyway. I think that it's a little cleaner with `ast.parse()` because there are fewer arguments, although it does require an import. – mhawke Mar 31 '16 at 11:12
  • 5
    For what it's worth, `is_valid_variable_name('a = b')`, `is_valid_variable_name('[]')`, `is_valid_variable_name('*a')` will all return `True`. – vaultah May 05 '17 at 15:48
  • 1
    @vaultah: it's worth a great deal and thanks for finding the flaw in my solution. Worth also noting that this problem only affects the `ast.parse()` solution. AFAIK the first solution still works as expected. – mhawke May 06 '17 at 12:28
  • @mhawke: yes, the first solution should work fine. Sorry, I should have mentioned that – vaultah May 06 '17 at 13:20
  • `is_valid_variable_name("a +") is True` @vultah – Thomas Grainger Sep 04 '20 at 21:57
  • Maybe try parsing a `def` or `lambda` with the tested string as a parameter name, instead of parsing a variable assignment. A lot of these edge cases which parse validly for `{} = None` should get rejected for something like `lambda {}=None: {}`. – mtraceur Jul 18 '22 at 19:03
  • I think there is no *robust and universal* way with `ast.parse`... the `,` character is particularly tricky, as it survives the switch from an assignment to a function parameter. Seems like if you want to be portable, there's no way around either checking the string manually for certain characters, or maybe doing a loop where you `parse` not only `name` but also all prefix substrings `name[:1]` to `name[:-1]`. – mtraceur Jul 19 '22 at 00:52
  • Although... I think we *do* rule out all unsafe-to-evaluate inputs that are valid Python <=2 if we try to `parse('lambda {0}=0: {0}'.format(name))`. Because the only unsafe thing that can validly parse in the first substitution is a default argument assignment, such as in `a=unsafe(),b`, but then the `=` would be an invalid assignment statement in the second substitution, which must be an expression because it's a `lambda` body. (However I haven't ruled out for sure if any Python >=3.* syntax adds something exploitable to the picture... for example `:` to introduce evaluated annotations.) – mtraceur Jul 19 '22 at 01:02
  • So in principle a portable approach might be to use `str.isidentifier(name) and not iskeyword(name)` if the former is available, otherwise fall back to a `parse` and then if that succeeds, you know it's safe to evaluate/execute to rule out the remaining edge-cases such as `a,b` (although if you know all the edge-cases that get through the lambda parse, then maybe it's better to just manually check for those cases). – mtraceur Jul 19 '22 at 01:05
3

EDIT: this is wrong and implementation dependent - see comments.

Just have Python do its own check by making a dictionary with the variable holding the name as the key and splatting it as keyword arguments:

def _dummy_function(**kwargs):
    pass

def is_valid_variable_name(name):
    try:
        _dummy_function(**{name: None})
        return True
    except TypeError:
        return False

Notably, TypeError is consistently raised whenever a dict splats into keyword arguments but has a key which isn't a valid function argument, and whenever a dict literal is being constructed with an invalid key, so this will work correctly on anything you pass to it.

mtraceur
  • 3,254
  • 24
  • 33
  • `**kwargs` can contain non-valid variable names. E.g. `is_valid_variable_name('[]')` returned True. I was not able to find any string, where this function returns False. Might be different in python 2. – Christoph Boeddeker Jul 18 '22 at 15:24
  • @ChristophBöddeker wild. I have strong memories of this working to reject arguments. But at least on 3.10 it behaves as you describe. – mtraceur Jul 18 '22 at 18:53
  • I found [a Python mailing list discussion](https://mail.python.org/archives/list/python-dev@python.org/thread/GMSIA7ACABIHLPU7AXRIONXSSAEFKSFH/) which indicates this was always the case. So I guess I was just wrong and didn't properly test it with strings. (It does reject non-strings, but that's not what this question is about.) – mtraceur Jul 18 '22 at 18:57
2

I don't think you need the exact same naming syntax as python itself. Would rather go for a simple regexp like:

\w+

to make sure it's something alphanumeric, and then add a prefix to keep away from python's own syntax. So the non-techie user's declaration:

LET return = 12

should probably become after your parsing:

userspace_return = 12
or
userspace['return'] = 12
ptrk
  • 1,800
  • 1
  • 15
  • 24
2

In Python 3, as above, you can simply use str.isidentifier. But in Python 2, this does not exist.

The tokenize module has a regex for names (identifiers): tokenize.Name. But I couldn't find any documentation for it, so it may not be available everywhere. It is simply r'[a-zA-Z_]\w*'. A single $ after it will let you test strings with re.match.

The docs say that an identifier is defined by this grammar:

identifier ::=  (letter|"_") (letter | digit | "_")*
letter     ::=  lowercase | uppercase
lowercase  ::=  "a"..."z"
uppercase  ::=  "A"..."Z"
digit      ::=  "0"..."9"

Which is equivalent to the regex above. But we should still import tokenize.Name in case this ever changes. (Which is very unlikely, but maybe in older versions of Python it was different?)

And to filter out keywords, like pass, def and return, use keyword.iskeyword. There is one caveat: None is not a keyword in Python 2, but still can't be assigned to. (keyword.iskeyword('None') in Python 2 is False).

So:

import keyword

if hasattr(str, 'isidentifier'):
    _isidentifier = str.isidentifier
else:
    import re
    _fallback_pattern = '[a-zA-Z_][a-zA-Z0-9_]*'
    try:
        import tokenize
    except ImportError:
        _isidentifier = re.compile(_fallback_pattern + '$').match
    else:
        _isidentifier = re.compile(
            getattr(tokenize, 'Name', _fallback_pattern) + '$'
        ).match

    del _fallback_pattern


def isname(s):
    return bool(_isidentifier(s)) and not keyword.iskeyword(s) and s != 'None'
Artyer
  • 31,034
  • 3
  • 47
  • 75
1

You could use exceptions handling and catch actually NameError and SyntaxError. Test it inside try/except block and inform user if there is some invalid input.

xiº
  • 4,605
  • 3
  • 28
  • 39
1

You could try a test assignment and see if it raises a SyntaxError:

>>> 2fg = 5
  File "<stdin>", line 1
    2fg = 5
      ^
SyntaxError: invalid syntax
snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
  • This assumes you are able to evaluate the name in the Python interpreter, and is unsafe for programmatic checking in the general case (`import os; os.replace(malicious_file, important_file); foo` can have `= 5` appended to it and still execute just fine). – mtraceur May 21 '21 at 19:30