0

I would like to get some reusable code to correctly determine whether or not a string is a valid variable name in python3 (or the currently-running python would suffice).

For example:

If given the string group_찇籸딥햳㸙濮ᚨ麍ڵថ, the method that solves this problem should return True, this can be used as a python variable name. If given group_찇籸딥-햳㸙濮ᚨ麍ڵថ, it should return False, because it cannot be used correctly as a singular variable name.

>>> group_찇籸딥햳㸙濮ᚨ麍ڵថ = 4
>>> group_찇籸딥-햳㸙濮ᚨ麍ڵថ = 4
  File "<stdin>", line 1
SyntaxError: can't assign to operator

I would prefer if the solution avoided unsafe evals. A previously-attempted solution was returning True if the regex ^[\d\W]|[^\w] was matched, but this seems to be incomplete since it mis-identifies the first example given above as invalid.

Claudio
  • 7,474
  • 3
  • 18
  • 48
AlanSE
  • 2,597
  • 2
  • 29
  • 22
  • Hello @AlanSE, I see you want to avoid `eval`, but what about the `literal_eval` function from the `ast` package ? – Bertrand Gazanion Jun 03 '19 at 14:54
  • @BertrandGazanion That would be fine to use. Even with the use of that I'm still somewhat unsure of how to address this in an airtight way. – AlanSE Jun 03 '19 at 14:57
  • I tried the ast.literal_eval approach but it is not working for the first example for me. – Netwave Jun 03 '19 at 15:10

1 Answers1

0

Checkout the below provided is_pyname(s) function able to determine whether a string is a valid variable name in Python 3 using three methods:

  • the regex (not in Python distribution yet provided) module supporting \p{L}, \p{N} matching all Unicode characters considered to be respectively Letters, Numbers (i.e. digits) and an appropriate regex pattern

  • running exec(s) in a try/except block

  • checking if tokeninfo.string == s where tokeninfo was created with help of the tokenize module

def is_pyname(s, method=None):
    if method == 'regex' or method is None: 
        import regex
        # ^-- regex supports \p{L} and \p{N} which match 
        # ^-- Unicode chars considered to be Letters and Numbers
        is_name_by_regex = False
        if regex.findall(r"\p{L}[\p{L}\p{N}_]*",s)[0] == s:
            is_name_by_regex = True
        if method == 'regex':
            return is_name_by_regex
    # ---
    if method == 'exec' or method is None:
        is_name_by_exec = False 
        try:
            exec(s+'=None')
            is_name_by_exec = True
        except SyntaxError:
            pass
        if method == 'exec':
            return is_name_by_exec
    # ---
    if method == 'tokens' or method is None:
        from io       import BytesIO  as bio
        from tokenize import tokenize as tknz
        #from token import tok_name
        is_name_by_tokens = False
        #mlnTxt = '\n'.join([str(t) for t in tknz(bio(s.encode()).readline)])
        #print(mlnTxt)
        for tokeninfo in tknz(bio(s.encode()).readline): 
            #print(f'{tokeninfo.type:3d} ({tok_name[tokeninfo.type]:10s}), {str(tokeninfo)}')
            if tokeninfo.string == s: # .type .string .start .end .line
                is_name_by_tokens = True
                break
        if method == 'tokens':
            return is_name_by_tokens
    # ---
    assert is_name_by_regex == is_name_by_exec == is_name_by_tokens
    #print('assertion OK')
    is_name = is_name_by_regex
    return is_name


print(is_pyname('찇籸딥햳㸙濮ᚨ麍ڵថ'))  # True
print(is_pyname('찇籸딥-햳㸙濮ᚨ麍ڵថ')) # False
print(is_pyname('7seven'))           # False

Claudio
  • 7,474
  • 3
  • 18
  • 48