This is my current source code for parsing the message from a fire department pager using regular expressions. Everything is working as it should except the pAddress line.
import re
sInput = '(CUPE123, CUPE124, MTVW211, MTVW215, SUNV5326) ALARM-STRUC (Alarm Type THERMAL SMOKE) (Box 12345) APPLE INC - 1 INFINITE LOOP CUPERTINO. (XStr DE ANZA BLVD/MARIANI AVE) .BUILDING FIRE - SMOKE SHOWING - PERSONS REPORTED. #F987654321'
# Matches truck names using the consistent four uppercase letters followed by three - four numbers.
pTrucks = ','.join(re.findall(r'\w[A-Z]{3}\d[0-9]{2,3}', sInput))
# Matches source and job type using the - as a guide, this section is always proceeded by the trucks on the job
# therefore is always proceeded by a ) and a space. Allows between 3-9 characters either side of the - this is
# to allow such variations as 911-RESC, FAA-AIRCRAFT etc.
pJobSource = ''.join(re.findall(r'\) ([A-Za-z1-9]{2,8}-[A-Za-z1-9]{2,8})', sInput))
# Gets address by starting at (but ignoring) the job source e.g. -RESC and capturing everything until the next . period
# the end of the address section always has a period. Uses ?; to ignore up to two sets of brackets that may appear in
# the string for things such as box numbers or alarm types.
pAddress = ''.join(re.findall(r'-[A-Z1-9]{2,8} (.*?)\. \(', sInput))
# Finds the specified cross streets as they are always within () brackets, each bracket has a space immediately
# before or after and the work XStr is always present.
pCrossStreet = ''.join(re.findall(r' \((XStr.*?)\) ', sInput))
# The job details / description is always contained between two . periods e.g. .42YOM CARDIAC ARREST. each period
# has a space either immediately before or after.
pJobDetails = ''.join(re.findall(r' \.(.*?)\. ', sInput))
# Job number is always in the format #F followed by seven digits. The # is always proceeded by a space. Allowed
# between 1 and 8 digits for future proofing.
pJobNumber = ''.join(re.findall(r' (#F\d{0,7})', sInput))
# Get optional Alarm type which is always presented with a space (Alarm
pAlarmDetails = ''.join(re.findall(r' \((Alarm .*?)\) ', sInput))
# Get optional Box type which is always presented with a space (Box
pBoxDetails = ''.join(re.findall(r' (\(Box .*?\))', sInput))
print "Responding Trucks: " + pTrucks
print "Job Source / Type: " + pJobSource
print "Address: " + pAddress
print "Cross Streets: " + pCrossStreet
print "Job Details: " + pJobDetails
print "Additional Info: " + pAlarmDetails + ", " + pBoxDetails
print "\n\nJob Number: " + pJobNumber
The problem is that the pager input has two optional fields (Alarm Type *) and (Box *) depending on the job both may be present, absent or a combination of the two. The code as it stands currently will return
Responding Trucks: CUPE123,CUPE124,MTVW211,MTVW215,SUNV5326
Job Source / Type: ALARM-STRUC
Address: (Alarm Type THERMAL SMOKE) (Box 12345) APPLE INC - 1 INFINITE LOOP CUPERTINO
Cross Streets: XStr DE ANZA BLVD/MARIANI AVE
Job Details: BUILDING FIRE - SMOKE SHOWING - PERSONS REPORTED
Additional Info: Alarm Type THERMAL SMOKE, (Box 12345)
Job Number: #F9876543
Everything is perfect except the Address line which has also pulled in the Alarm type and the Box#.
How can I modify the RegEx so that the (Alarm Type) and (Box) fields are treated as optional? I've tried this from another SO thread and it worked perfectly with the current sinput string.
pAddress = ''.join(re.findall(r'-[A-Z1-9]{2,8}(?: \(Alarm .*?\))(?: \(Box .*\)) (.*?)\. \(', sInput))
returning
Responding Trucks: CUPE123,CUPE124,MTVW211,MTVW215,SUNV5326
Job Source / Type: ALARM-STRUC
Address: APPLE INC - 1 INFINITE LOOP CUPERTINO
Cross Streets: XStr DE ANZA BLVD/MARIANI AVE
Job Details: BUILDING FIRE - SMOKE SHOWING - PERSONS REPORTED
Additional Info: Alarm Type THERMAL SMOKE, (Box 12345)
Job Number: #F9876543
which is perfect and my desired result, however, when I change the sInput string to contain neither (Alarm Type *) or (Box *)
sInput = '(CUPE123, CUPE124, MTVW211, MTVW215, SUNV5326) ALARM-STRUC APPLE INC - 1 INFINITE LOOP CUPERTINO. (XStr DE ANZA BLVD/MARIANI AVE) .BUILDING FIRE - SMOKE SHOWING - PERSONS REPORTED. #F987654321'
The output then returns nothing in the address field
Responding Trucks: CUPE123,CUPE124,MTVW211,MTVW215,SUNV5326
Job Source / Type: ALARM-STRUC
Address:
Cross Streets: XStr DE ANZA BLVD/MARIANI AVE
Job Details: BUILDING FIRE - SMOKE SHOWING - PERSONS REPORTED
Additional Info: ,
Job Number: #F9876543
I feel like I'm so close and just missing something... Sorry for the long post, might be a bit TMI.
TL;DR How can I modify the RegEx of the pAddress variable to ignore the (Alarm Type *) and (Box *) fields regardless if they are present or not?