I am trying to parse data from a line like this
"Lorem ipsum dolor sit amet, IP: 111.111.111.111, 222.222.222.222, 333.333.333.333\r\n adipiscing elit, sed do eiusmod\r\n tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud"
I am trying to capture the values like this:
- message:
"Lorem ipsum dolor sit amet, IP: 111.111.111.111, 222.222.222.222, 333.333.333.333\r\n adipiscing elit, sed do eiusmod\r\n tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud"
- ip:
"111.111.111.111, 222.222.222.222, 333.333.333.333"
There can be arbitrary many IPs, including zero.
I am using fluent-bit with a single regex. This is an example of a fluent-bit parser definition:
[PARSER]
Name syslog-rfc3164
Format regex
Regex /^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$/
Time_Key time
Time_Format %b %d %H:%M:%S
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
Thanks to Cary and Aleksei here is the solution:
\A(?<whole>.*?((?<=IP: )(?<ip>(?<four_threes>\d{1,3}(?:\.\d{1,3}){3})(?:, \g<four_threes>)*)).*?)\z
https://rubular.com/r/Kgh5EXMCA0lkew
EDIT
I realized that some strings don't have the "IP:..." pattern in them which give me a parsing error.
string1: "Lorem ipsum dolor sit amet, IP: 111.111.111.111, 222.222.222.222, 333.333.333.333\r\n adipiscing elit, sed do eiusmod\r\n tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud"
string2: "Lorem ipsum dolor sit amet, \r\n adipiscing elit, sed do eiusmod\r\n tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud"
I tried applying *(0 or more) to the ip group name match but i was not able to make it work. Any idea how i can do this?