I'm sure that if a solution exists for this then its out there somewhere but I can't find it. I've followed Python regex to match a specific word and had success in the first aspect but now am struggling with the second aspect.
I've inherited a horrible file format where each test result is on its own line. They are limited to 12 chars per record so some results are split into groups of lines e.g SITE, SITE1 and SITE2. I'm trying to parse the file into a dictionary so I can do more analysis with it and ultimately produce a formatted report.
The link above / code below allows me to match each SITE and concatenate them together but its giving me problems matching INS, INS 1 and INS 2 correctly. Yes the space is intentional - its what I have to deal with. INS is the test result and INS 1 is the limit of the test for a pass.
Is there a regular expression that would match
SITE > SITE True but SITE > SITE1 false
and
INS > INS True but INS to INS 1 false?
Here is the python code.
import re
lines = ['SITE start', 'SITE1 more', 'SITE2 end','INS value1', 'INS 1 value2']
headings = ['SITE','SITE1',"SITE2", "INS", "INS 1"]
for line in lines:
for heading in headings:
headregex = r"\b" + heading + r"\b"
match = re.search(headregex,heading)
if match:
print "Found " + heading + " " + line
else:
print "Not Found " + heading + " " + line
And here is some dummy data:
TEST MODE 131 AUTO
SITE startaddy
SITE1 middle addy
SITE2 end addy
USER DB
VISUAL CHECK P
BOND RANGE 25A
EARTH 0.09 OHM P
LIMIT 0.10 OHM
INS 500 V
INS 1 >299 MEG P
...
TEST MODE 231 AUTO
SITE startaddy
SITE1 middle addy
SITE2 end addy
USER DB
VISUAL CHECK P
INS 500 V
INS 2 >299 MEG P
...
Sorry for the horrid formatting - its copied and pasted from what I am dealing with!