0

I am trying to extract several fields from a log file. I am having trouble with mixtures of IPv4 addresses, subnets and variables. So far I can only match one kind of field (i.e. IP or string).

import re

regex = re.search(
    r'.*(?P<destination_address>\b((?:\d+\.){3}\d+(?:/\d+)?)|\w+)\b(?P<destination_port>\d+)?\b(?P<destination_options>\w)?(?=via|\Z|//)',
    "Myfirewall add 50750 set Mycounter allow udp from any to 123.45.67.89/28  123 via someotheriface"
)

regex2 = re.search(
    r'.*(?P<destination_address>\b((?:\d+\.){3}\d+(?:/\d+)?)|\w+)\b(?P<destination_port>\d+)?\b(?P<destination_options>\w)?(?=via|\Z|//)',
    "Myfirewall add 50750 set Mycounter allow udp from 123.45.67.89/28 to Mynic opt1 opt2,opt3 via someotheriface"
)

In both cases, there is no match. I would expect regex.group("destination_port")=="123" and regex2.group("destination_options")=="opt1 opt2,opt3" .

What I currently can extract: all required fields up to the keyword "to" (not shown here, LMK if relevant). What I am still struggling with:

  • capturing the string between "to" and "via", comment start (//) or newline

  • deciding whether it is a constant (IPv4) or variable (string), this is the main part

  • separating the main part from secondary parts - ports or options

If a regex for this task is too complicated, I am open to alternative solutions. I have used several other issues to build my regex so far.

Python regex to match IP-address with /CIDR

Python regex capture whole integer

(Python) Regex to extract network-object group from Cisco config

vvvvv
  • 25,404
  • 19
  • 49
  • 81
xancho
  • 3
  • 2
  • 1
    Can you update the question with example strings and the expected matches? – The fourth bird Mar 01 '23 at 11:31
  • Your first regex does match, it just doesn't match anything useful. There are many optional fields which the regex engine will skip if it has to in order to capture a match; probably make those obligatory until you get the match you expect and then take it from there. Figuring out what you hoped should match and why requires quite a bit of guesswork. I'm also wondering if you forgot to put whitespace where there should be some. A regex can only match a piece of contiguous text – tripleee Mar 01 '23 at 12:31
  • Guessing a bit, I would try something like `r'\b(?P(?:\d+\.){3}\d+(?:/\d+)?)\W+(?P\d+)\W+(?P\w+)\W+(?=via|\Z|//)'` for the first expression – tripleee Mar 01 '23 at 12:31

1 Answers1

0

"If a regex for this task is too complicated, I am open to alternative solutions. I have used several other issues to build my regex so far."

My suggestion to you is to have a look at:

"A human-readable regular expression module for Python". Available on Github in the humre repository.

A module developed by Al Sweigart.

As an example the difference between re and Humre:

  • American Phone Number with re:
import re
re.compile('\d{3}-\d{3}-\d{4}')
  • American Phone Number with Humre:
from humre import *
compile(exactly(3, DIGIT), "-", exactly(3, DIGIT), "-", exactly(4, DIGIT))

Good Luck

S_IROTH
  • 210
  • 1
  • 5