2

I have a config from Cisco ASA and I need to write a Python RegEx to capture everything that is in the object-groups and group them for further processing.

For example:

object-group network FTP
 description FTP Access
 network-object host BCD1
 network-object host BCD2
object-group network NTP
 description NTP Access
 network-object host ABC1
 network-object host ABC2
 network-object host ABC3
object-group service sample_service tcp
 description Ports 1 2 3
 port-object range 80 81
 port-object eq pop3
 port-object eq imap4
 port-object range 443 444
object-group service 8080 tcp
 description Servers

The end result should be something like this:

Group 1: object-group network FTP
          description FTP Access
          network-object host BCD1
          network-object host BCD2

Group 2:  object-group network NTP
          description NTP Access
          network-object host ABC1
          network-object host ABC2
etc.

As I said I am very bad at this, but I tried to come up with something but the result was horrible

(object-group\s[^!]*)object or (object-group[^!]*)

Both of them failed.

Remi Guan
  • 21,506
  • 17
  • 64
  • 87
gh0st
  • 402
  • 5
  • 13

2 Answers2

2

You can use this regex written with the unroll-the-loop technique in mind:

\bobject-group\b\S*(?:\s+(?!object-group\b)\S*)*

See regex demo. It is basically the same as (?s)object-group(?:(?!\bobject-group\b).)*, or (?s)object-group.*?(?=\bobject-group\b|$), but is more efficient.

Explanation:

  • \bobject-group\b - literal sequence of characters object-group (a whole word due to \b word boundaries)
  • \S* - zero or more non-whitespace symbols
  • (?:\s+(?!object-group\b)\S*)* - zero or more sequences of...
    • \s+(?!object-group\b) - 1 or more whitespace symbols that are not followed with object-group whole word
    • \S* - zero or more non-whitespace symbols.

Python code:

import re
p = re.compile(r'\bobject-group\b\S*(?:\s+(?!object-group\b)\S*)*')
test_str = "object-group network FTP\n description FTP Access\n network-object host BCD1\n network-object host BCD2\nobject-group network NTP\n description NTP Access\n network-object host ABC1\n network-object host ABC2\n network-object host ABC3\nobject-group service sample_service tcp\n description Ports 1 2 3\n port-object range 80 81\n port-object eq pop3\n port-object eq imap4\n port-object range 443 444\nobject-group service 8080 tcp\n description Servers"
print(re.findall(p, test_str))
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thank you very much stribizhev! The regex works! One quick question it is not a big deal I already found a work around, but if you can help me will be awesome. After group-objects comes access-list configuration and your regex captures them as well. Is there a way to exclude them from the match. My hole goal is to open the running config and do regex on it without chopping it to pieces. – gh0st Jan 15 '16 at 10:46
  • I believe all you need is to add that string as an alternative: [`\bobject-group\b\S*(?:\s+(?!(?:access-list|object-group)\b)\S*)*`](https://regex101.com/r/hG3oG5/4). – Wiktor Stribiżew Jan 15 '16 at 10:49
1

You don't need a complicated, difficult to understand, regex to do this. Simply iterate over the file breaking on lines that begin with object-group and build up a dictionary of lists.

You can do it with itertools.groupby() or a defaultdict of list. I prefer the latter which will give you a dictionary useful for further processing:

from collections import defaultdict

object_groups = defaultdict(list)
key = 0
with open('cisco.cfg') as f:
    for line in f:
        if line.startswith('object-group'):
            key += 1
        object_groups[key].append(line.strip())

from pprint import pprint
pprint(object_groups.items())

Assuming your sample input, the output would be:

[(1,
  ['object-group network FTP',
   'description FTP Access',
   'network-object host BCD1',
   'network-object host BCD2']),
 (2,
  ['object-group network NTP',
   'description NTP Access',
   'network-object host ABC1',
   'network-object host ABC2',
   'network-object host ABC3']),
 (3,
  ['object-group service sample_service tcp',
   'description Ports 1 2 3',
   'port-object range 80 81',
   'port-object eq pop3',
   'port-object eq imap4',
   'port-object range 443 444']),
 (4, ['object-group service 8080 tcp', 'description Servers'])]

Also, you could instead use the object group identifiers as keys:

from collections import defaultdict

object_groups = defaultdict(list)
key = None
with open('cisco.cfg') as f:
    for line in f:
        if line.startswith('object-group'):
#            key = line.strip()                      # the whole line
            key = line.strip().partition(' ')[-1]    # just the object group definition
        else:
            object_groups[key].append(line.strip())

from pprint import pprint
pprint(object_groups.items())

which will create a similar dictionary but with keys 'network FTP', 'network NTP', 'service sample_service tcp' etc.

mhawke
  • 84,695
  • 9
  • 117
  • 138
  • Thanks for showing this to me mhawke! This is really awesome. I love this method because I don't have to write regex! – gh0st Jan 15 '16 at 10:49
  • @gh0st: exactly. regex can be difficult to understand and maintain. This method is easy to understand, and to later modify if, as usual, inevitable. Not sure then why you have accepted a regex answer, but perhaps that is your requirement? – mhawke Jan 15 '16 at 10:55
  • I wish I could select both answers, and to be honest I liked your answer more because I can understand the logic behind it, but my question was regarding regex and stribizhev answered it. It would be unfair if I leave his work unnoticed. You went an extra mile an showed me completely different approach and I appreciate that very much! – gh0st Jan 15 '16 at 11:28
  • @gh0st: That's fair enough. – mhawke Jan 15 '16 at 11:58