Extract variables using python regex

Question

Input file contains following lines:

a=b*c;
d=a+2;
c=0;
b=a;

Now for each line I want to extract variables that has been used.For example, for line 1, the output should be [a,b,c].Currently I am doing as follows :

var=[a,b,c,d]     # list of variables
for line in file_ptr :
    if '=' in line :
        temp=line.split('=') :
        ans=list(temp[0])
        if '+' in temp[1] :
             # do something
        elif '*' in temp[1] :
             # do something
        else :
             # single variable as line 4  OR constant as line 3

Is it possible to do this using regex?

EDIT:

Expected output for above file :

[a,b,c]
[d,a]
[c]
[a,b]

What output would you expect from the input you've specified? — Robᵩ, May 04 '16 at 21:16

score 1 · Accepted Answer · answered May 04 '16 at 21:20

1

I would use re.findall() with whatever pattern matches variable names in the example's programming language. Assuming a typical language, this might work for you:

import re

lines = '''a=b*c;
d=a+2;
c=0;
b=a;'''

for line in lines.splitlines():
    print re.findall('[_a-z][_a-z0-9]*', line, re.I)

answered May 04 '16 at 21:20

Robᵩ

163,533
20
239
308

[This answer](http://stackoverflow.com/a/24541431/2564301) contains the explanation @AkaSh is looking for. Python's variable names are case sensitive, so it needs the Ignore Case flag (or, alternatively, just a few `A-Z`s). I'd throw in a few `\b` for consistency, though. – Jongware May 04 '16 at 21:24

score 1 · Answer 2 · answered May 04 '16 at 21:24

I'd use some shorter pattern for matching variable names:

import re
strs = ['a=b*c;', 'd=a+2;', 'c=0;', 'b=a;']
print([re.findall(r'[_a-z]\w*', x, re.I) for x in strs])

See the Python demo

Pattern matches:

[_a-z] - a _ or an ASCII letter (any upper or lowercase due to the case insensitive modifier use re.I)
\w* - 0 or more alphanumeric or underscore characters.

See the regex demo

score 0 · Answer 3 · answered May 04 '16 at 20:57

If you want just the variables, then do this:

answer = []
for line in file_ptr :
    temp = []
    for char in line:
        if char.isalpha():
            temp.append(char)
    answer.append(temp)

A word of caution though: this would work only with variables that are exactly 1 character in length. More details about isalpha() can be found here or here.

score 0 · Answer 4 · answered May 04 '16 at 20:58

0

I'm not entirely sure what you're after, but you can do something like this:

re.split(r'[^\w]', line)

to give a list of the alphabetic characters in the line:

>>> re.split(r'[^\w]', 'a=b*c;')
['a', 'b', 'c', '']

answered May 04 '16 at 20:58

Daniel Roseman

588,541
66
880
895

1

`a`, `b`, `c` are variables.These are elements of list `var` – AkaSh May 04 '16 at 21:00
I'm sorry, I have no idea what you mean. – Daniel Roseman May 04 '16 at 21:01
1

This fails on the examples with digits; digits are also 'word characters' and so would be included. It can trivially be fixed, though, especially if Python's `re` supports `[[:alpha:]]`. However, for 'any' variable name, you need a different expression for just the first character and all next ones, because `a0` is a valid variable name. – Jongware May 04 '16 at 21:12

score 0 · Answer 5 · answered May 04 '16 at 21:16

0

This is how I did :

l=re.split(r'[^A-Za-z]', 'a=b*2;')
l=filter(None,l)

answered May 04 '16 at 21:16

AkaSh

486
4
16

Extract variables using python regex

5 Answers5