4

I'm trying to parse out some mail logs that have the three following possible formats for the relay.

Oct 24 03:49:10 mxout/mxout/1.1.1.1 sendmail[4642]: x9NA4Wbp011336: to=<email@company.com>, delay=1+00:44:37, xdelay=00:00:00, mailer=esmtp, pri=459, relay=mail-company.com. [0.0.0.0], tls=no, dsn=4.0.0, stat=Deferred: Connection reset by mail-company.com
Oct 24 03:49:10 mxout/mxout/1.1.1.1 sendmail[4642]: x9NA4Wbp011336: to=<email@company.com>, delay=1+00:44:37, xdelay=00:00:00, mailer=esmtp, pri=459, relay=[0.0.0.0], tls=no, dsn=4.0.0, stat=Deferred: Connection reset by mail-company.com
Oct 24 03:49:10 mxout/mxout/1.1.1.1 sendmail[4642]: x9NA4Wbp011336: to=<email@company.com>, delay=1+00:44:37, xdelay=00:00:00, mailer=esmtp, pri=459, relay=mail-company.com., tls=no, dsn=4.0.0, stat=Deferred: Connection reset by mail-company.com

With the this code:

my $topat    = '^(\w{3})\s{1,2}(\d{1,2}) (\d{2}:\d{2}:\d{2}).+ sendmail\[\d.+\]: (\w+): to=<(\S+)>(?:,|, \[more\],) delay.+, relay=(?:(?:\S+ )?\[(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]|(\S+)\.), .+, stat=(.+)';

foreach my $line(@i) {
  if($line =~ /$topat/){
    my ($month, $day, $time, $id, $addy, $relay, $stat) = ($line =~ m/$topat/);
     print $line;
     print "$addy $relay $stat\n";
  }
}

I get the following errors:

Oct 24 03:49:10 mxout/mxout/1.1.1.1 sendmail[4642]: x9NA4Wbp011336: to=<email@company.com>, delay=1+00:44:37, xdelay=00:00:00, mailer=esmtp, pri=459, relay=mail-company.com. [0.0.0.0], tls=no, dsn=4.0.0, stat=Deferred: Connection reset by mail-company.com
Use of uninitialized value $stat in concatenation (.) or string at ./reg_test line 26.
email@company.com 0.0.0.0 

Oct 24 03:49:10 mxout/mxout/1.1.1.1 sendmail[4642]: x9NA4Wbp011336: to=<email@company.com>, delay=1+00:44:37, xdelay=00:00:00, mailer=esmtp, pri=459, relay=[0.0.0.0], tls=no, dsn=4.0.0, stat=Deferred: Connection reset by mail-company.com
Use of uninitialized value $stat in concatenation (.) or string at ./reg_test line 26.
email@company.com 0.0.0.0 

Oct 24 03:49:10 mxout/mxout/1.1.1.1 sendmail[4642]: x9NA4Wbp011336: to=<email@company.com>, delay=1+00:44:37, xdelay=00:00:00, mailer=esmtp, pri=459, relay=mail-company.com., tls=no, dsn=4.0.0, stat=Deferred: Connection reset by mail-company.com
Use of uninitialized value $relay in concatenation (.) or string at ./reg_test line 26.
email@company.com  mail-company.com

In the first two cases it properly grabs the address and relay but not the stat. And in the third it grabs the address and relay but it thinks that $relay is blank and the $stat is the relay.

I've tried a number of different configurations and groups and I can't seem to find the right solution. any pointers would be much appreciated.

tleif
  • 95
  • 5
  • In the first line, the `stat` (last group) matches `Deferred: Connection reset by mail-company.com`, isn't it expected? I can't see any specific issues with the third line either, see [your regex demo](https://regex101.com/r/SOpL99/2) – Wiktor Stribiżew Oct 24 '19 at 21:29
  • It does match correctly in the tester, but when I try printing it out with the above code, or if I try printing "$1 $2 $3 $4 $5 $6 $7\n" I come out with the same errors and problems as above. – tleif Oct 24 '19 at 21:42

1 Answers1

3

You have two alternatives in the relay field:

relay=(?:(?:\S+ )?\[(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]|(\S+)\.)
                    ^    ----      $6         ----     ^  | ^$7^ 

If it doesn't follow the first pattern but matches the second one, the relay ends up in $7 and $stat. $stat is never populated correctly as it needs $8, not $7.

You can use the branch reset pattern that uses the same capture number for all alternatives:

(?|(?:\S+ )?\[(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]|(\S+)\.)
  ^

Or, use the original regex and populate two variables:

    my ($month, $day, $time, $id, $addy, $relay, $relay_alt, $stat) = $line =~ m/$topat/;
    $relay //= $relay_alt;
choroba
  • 231,213
  • 25
  • 204
  • 289