0

Input file contains following lines:

a=b*c;
d=a+2;
c=0;
b=a;

Now for each line I want to extract variables that has been used.For example, for line 1, the output should be [a,b,c].Currently I am doing as follows :

var=[a,b,c,d]     # list of variables
for line in file_ptr :
    if '=' in line :
        temp=line.split('=') :
        ans=list(temp[0])
        if '+' in temp[1] :
             # do something
        elif '*' in temp[1] :
             # do something
        else :
             # single variable as line 4  OR constant as line 3

Is it possible to do this using regex?

EDIT:

Expected output for above file :

[a,b,c]
[d,a]
[c]
[a,b]
AkaSh
  • 486
  • 4
  • 16

5 Answers5

1

I would use re.findall() with whatever pattern matches variable names in the example's programming language. Assuming a typical language, this might work for you:

import re

lines = '''a=b*c;
d=a+2;
c=0;
b=a;'''

for line in lines.splitlines():
    print re.findall('[_a-z][_a-z0-9]*', line, re.I)
Robᵩ
  • 163,533
  • 20
  • 239
  • 308
  • [This answer](http://stackoverflow.com/a/24541431/2564301) contains the explanation @AkaSh is looking for. Python's variable names are case sensitive, so it needs the Ignore Case flag (or, alternatively, just a few `A-Z`s). I'd throw in a few `\b` for consistency, though. – Jongware May 04 '16 at 21:24
1

I'd use some shorter pattern for matching variable names:

import re
strs = ['a=b*c;', 'd=a+2;', 'c=0;', 'b=a;']
print([re.findall(r'[_a-z]\w*', x, re.I) for x in strs])

See the Python demo

Pattern matches:

  • [_a-z] - a _ or an ASCII letter (any upper or lowercase due to the case insensitive modifier use re.I)
  • \w* - 0 or more alphanumeric or underscore characters.

See the regex demo

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

If you want just the variables, then do this:

answer = []
for line in file_ptr :
    temp = []
    for char in line:
        if char.isalpha():
            temp.append(char)
    answer.append(temp)

A word of caution though: this would work only with variables that are exactly 1 character in length. More details about isalpha() can be found here or here.

lesingerouge
  • 1,160
  • 7
  • 14
0

I'm not entirely sure what you're after, but you can do something like this:

re.split(r'[^\w]', line)

to give a list of the alphabetic characters in the line:

>>> re.split(r'[^\w]', 'a=b*c;')
['a', 'b', 'c', '']
Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895
  • 1
    `a`, `b`, `c` are variables.These are elements of list `var` – AkaSh May 04 '16 at 21:00
  • I'm sorry, I have no idea what you mean. – Daniel Roseman May 04 '16 at 21:01
  • 1
    This fails on the examples with digits; digits are also 'word characters' and so would be included. It can trivially be fixed, though, especially if Python's `re` supports `[[:alpha:]]`. However, for 'any' variable name, you need a different expression for just the first character and all next ones, because `a0` is a valid variable name. – Jongware May 04 '16 at 21:12
0

This is how I did :

l=re.split(r'[^A-Za-z]', 'a=b*2;')
l=filter(None,l)
AkaSh
  • 486
  • 4
  • 16