I'm trying to parse the result of a whois query. I'm interested in retrieving the route, descr and origin fields as shown below:
route: 129.45.67.8/91
descr: FOO-BAR
descr: Information 2
origin: AS5462
notify: foo@bar.net
mnt-by: AS5462-MNT
remarks: For abuse notifications please file an online case @ http://www.foo.com/bar
changed: foo@bar.net 20000101
source: RIPE
remarks: ****************************
remarks: * THIS OBJECT IS MODIFIED
remarks: * Please note that all data that is generally regarded as personal
remarks: * data has been removed from this object.
remarks: * To view the original object, please query the RIPE Database at:
remarks: * http://www.foo.net/bar
remarks: ****************************
route: 123.45.67.8/91
descr: FOO-BAR
origin: AS3269
mnt-by: BAR-BAZ
changed: foo@bar.net 20000101
source: RIPE
remarks: ****************************
remarks: * THIS OBJECT IS MODIFIED
remarks: * Please note that all data that is generally regarded as personal
remarks: * data has been removed from this object.
remarks: * To view the original object, please query the RIPE Database at:
remarks: * http://www.ripe.net/whois
remarks: ****************************
To do so I use the following code and regex:
search = "FOO-BAR"
with open(FILE, "r") as f:
content = f.read()
r = re.compile(r'route:\s+(.*)\ndescr:\s+(.*' + search + '.*).*\norigin:\s+(.*)', re.IGNORECASE)
res = r.findall(content)
print res
It does work as expected with result containing only one descr field, however it ignores results containing multiple descr field.
I get the following result in this case:
[('123.45.67.8/91', 'FOO-BAR', 'AS3269')]
The expected result is to have the route field, first descr field in case of multiple descr line and origin field.
[('129.45.67.8/91', 'FOO-BAR', 'AS5462'), ('123.45.67.8/91', 'FOO-BAR', 'AS3269')]
What would be the correct regex to parse the results containing one AND several descr line?