Im trying to parse info from ifconfig (ubuntu). Normally, I would split a chunk of data like this down into words, and then search for substrings to get what I want. For example, given line = "inet addr:192.168.98.157 Bcast:192.168.98.255 Mask:255.255.255.0"
, and looking for the broadcast address, I would do:
for word in line.split():
if word.startswith('Bcast'):
print word.split(':')[-1]
>>>192.168.98.255
However, I feel its about time to start learning how to use regular expressions for tasks like this. Here is my code so far. I've hacked through a couple of patterns (inet addr, Bcast, Mask). Questions after code...
# git clone git://gist.github.com/1586034.git gist-1586034
import re
import json
ifconfig = """
eth0 Link encap:Ethernet HWaddr 08:00:27:3a:ab:47
inet addr:192.168.98.157 Bcast:192.168.98.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe3a:ab47/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:189059 errors:0 dropped:0 overruns:0 frame:0
TX packets:104380 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:74213981 (74.2 MB) TX bytes:15350131 (15.3 MB)\n\n
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:389611 errors:0 dropped:0 overruns:0 frame:0
TX packets:389611 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:81962238 (81.9 MB) TX bytes:81962238 (81.9 MB)
"""
for paragraph in ifconfig.split('\n\n'):
info = {
'eth_port': '',
'ip_address': '',
'broadcast_address': '',
'mac_address': '',
'net_mask': '',
'up': False,
'running': False,
'broadcast': False,
'multicast': False,
}
if 'BROADCAST' in paragraph:
info['broadcast'] = True
if 'MULTICAST' in paragraph:
info['multicast'] = True
if 'UP' in paragraph:
info['up'] = True
if 'RUNNING' in paragraph:
info['running'] = True
ip = re.search( r'inet addr:[^\s]+', paragraph )
if ip:
info['ip_address'] = ip.group().split(':')[-1]
bcast = re.search( r'Bcast:[^\s]+', paragraph )
if bcast:
info['broadcast_address'] = bcast.group().split(':')[-1]
mask = re.search( r'Mask:[^\s]+', paragraph )
if mask:
info['net_mask'] = mask.group().split(':')[-1]
print paragraph
print json.dumps(info, indent=4)
Here're my questions:
Am I taking the best approach for the patterns I have already implemented? Can I grab the addresses without splitting on ':' and then choosing the last of the array.?
I'm stuck on HWaddr. What would be a pattern to match this mac address?
EDIT:
Ok, so here's how I ended up going about this. I started out trying to go about this without the regex... just manipulating stings and lists. But that proved to be a nightmare. For example, what separates HWaddr
from its address is a space
. Now take inet addr
its separated from its address by :
. Its a tough problem to scrape with differing separators like this. Not only a problem to code but also a problem to read.
So, I did this with regex. I think this makes a strong case for when to use regular expressions.
# git clone git://gist.github.com/1586034.git gist-1586034
# USAGE: pipe ifconfig into script. ie "ifconfig | python pyifconfig.py"
# output is a list of json datastructures
import sys
import re
import json
ifconfig = sys.stdin.read()
print 'STARTINPUT'
print ifconfig
print 'ENDINPUT'
def extract(input):
mo = re.search(r'^(?P<interface>eth\d+|eth\d+:\d+)\s+' +
r'Link encap:(?P<link_encap>\S+)\s+' +
r'(HWaddr\s+(?P<hardware_address>\S+))?' +
r'(\s+inet addr:(?P<ip_address>\S+))?' +
r'(\s+Bcast:(?P<broadcast_address>\S+)\s+)?' +
r'(Mask:(?P<net_mask>\S+)\s+)?',
input, re.MULTILINE )
if mo:
info = mo.groupdict('')
info['running'] = False
info['up'] = False
info['multicast'] = False
info['broadcast'] = False
if 'RUNNING' in input:
info['running'] = True
if 'UP' in input:
info['up'] = True
if 'BROADCAST' in input:
info['broadcast'] = True
if 'MULTICAST' in input:
info['multicast'] = True
return info
return {}
interfaces = [ extract(interface) for interface in ifconfig.split('\n\n') if interface.strip() ]
print json.dumps(interfaces, indent=4)