0

I need to extract the names of variables from a function string.

A variable can be [a-zA-Z0-9]+ but not a real number notated like 1, 3.5, 1e4, 1e5...

Is there a smart way of doing this?

Here's a M(not)WE in python:

import re
pattern = r"[a-zA-z0-9.]+"
function_string = "(A+B1)**2.5"
re.findall(pattern, function_string)

The above code returns:

A, B1 and 2.5.

My desired output is

A and B1.

And here's a nice way of testing the regular expressions: https://regex101.com/r/fv0DfR/1

Bastian
  • 901
  • 7
  • 23
  • 2
    So what is the question? The desired output should be? – DirtyBit Jan 30 '19 at 09:56
  • The [regex pattern for C variables](https://stackoverflow.com/questions/12993187/regular-expression-to-recognize-variable-declarations-in-c) might be of use. – meowgoesthedog Jan 30 '19 at 10:00
  • 1
    Does a variable have to start with a non numeric character? ie is 12a valid? – JGNI Jan 30 '19 at 10:15
  • @JGNI good point, I will think about that. Variables can't start with a non-numeric in my application (python) but at the same time the entire function string is probably erroneous if any of the contained expressions starts with a non-numeric. – Bastian Jan 30 '19 at 17:01

2 Answers2

1
import re
pattern = r'[a-zA-Z_][a-zA-Z0-9_]{0,31}'
function_string = "(A+B1)2.5"

print(re.findall(pattern, function_string))

OUTPUT:

['A', 'B1']
DirtyBit
  • 16,613
  • 4
  • 34
  • 55
0

Try this Regex:

\b(?!\d)[a-zA-Z0-9]+

Click for Demo

Explanation:

  • \b - matches a word boundary
  • (?!\d) - negative lookahead to make sure that the next character is not a digit. This will make sure that the variable name does not start with a digit. Will also exclude words like 1e3
  • [a-zA-Z0-9]+ - matches 1+ letters or digits

If you want those variables also which start with a digit and are alphanumeric, you can use \b(?!\d+(?:[eE]\d+)?\b)[a-zA-Z0-9]+

Bastian
  • 901
  • 7
  • 23
Gurmanjot Singh
  • 10,224
  • 2
  • 19
  • 43
  • `1E4` and it fails, if it counts. Although its okay since the OP did not mention anything else, cheers – DirtyBit Jan 30 '19 at 10:18
  • @user5173426 Updated – Gurmanjot Singh Jan 30 '19 at 10:19
  • mmh I was a bit fast at accepting your answer but I see you're adjusting still, thank you. It would be good indeed to have both `1e4` and `1E4` excluded, but I don't care about hexadecimals, nor complex numbers etc. Will edit my question. – Bastian Jan 30 '19 at 10:21