4

Anyone be able to help me with this regex please? I need an expression that will match the line that does not contain the "Created" string at the end. This script is being used to read the headings on some source code.

$string = "* JAN-01-2001   bugsbunny     1234     Created Module";
#$string = "* DEC-12-2012   bugsbunny     5678     Modified Module";
if($string =~ /^\*\s+(\w\w\w-\d\d-\d\d\d\d)\s+(\w+)\s+(\d+)\s+(?!Created)/){
    print "$1\n$2\n$3\n$4\n";
} else {
    print "no match\n";
}

When using the first $string definition, I need the match to fail because it has the word "Created" at the end of it. When using the second $string definition, it should pass and I need to pull out the date($1), user($2), change number($3) and description($4).

The expression above is not working. Any advice please?

John Lee
  • 1,357
  • 1
  • 13
  • 26

4 Answers4

4

Close:

/^\*\s+(\w{3}-\d{2}-\d{4})\s+(\w+)\s+(\d+)\s+(?!.*Created)/

You need to allow any number of non-newline characters before Created, therefore the .*.

Otherwise, the regex would simply back up by one character when matching \s+, so the following text would be " Created", and then (?!Created) would match.

See it here; notice how the match stops one space before Created.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Also need `.*` after `Created`. Actually OP definition of `end` is wrong. He probably wants `Created` should not be in the last part of the string. – Rohit Jain Feb 05 '13 at 08:03
  • @RohitJain: No. `.*` after `Created` is not necessary. The lookahead match is not anchored to the end of the string. He wrote "does not *contain* `Created` at the end", not "does not end with `Created`". – Tim Pietzcker Feb 05 '13 at 08:04
  • @TimPietzcker.. Ah! How did I miss that. – Rohit Jain Feb 05 '13 at 08:05
  • @TimPietzcker Thank you Tim, that was perfect. I still needed that last description field so ended up using this: /^\*\s+(\w{3}-\d{2}-\d{4})\s+(\w+)\s+(\d+)\s+(?!.*Created)(.*)/ – John Lee Feb 05 '13 at 18:35
1

Another trick you can use to make this work is using a (?>...) group that disables backtracking. Disabling backtracking means that any expression using + or * will greedily eat up anything it finds, and it will never go back to try something else if the pattern fails. This means that all of the whitespace before "Created" is eaten up, so the (?!Created) part of the regex always occurs at the exact right point.

if($string =~ /^(?>\*\s+(\w\w\w-\d\d-\d\d\d\d)\s+(\w+)\s+(\d+)\s+)(?!Created)/){
    print "$1\n$2\n$3\n";
} else {
    print "no match\n";
}

This also has the added bonus of making your regex much faster.

This approach doesn't work for every kind of problem, because many regexes need to be able to backtrack in order to match correctly. But it will work great for this one.

0

Another option is to split and test the description for 'Created':

use strict;
use warnings;

#my $string = "* JAN-01-2001   bugsbunny     1234     Created Module";
my $string = "* DEC-12-2012   bugsbunny     5678     Modified Module";

my ( undef, $date, $user, $change, $desc ) = split ' ', $string, 5;

if ( $desc !~ /^Created/ ) {
    print "$date\n$user\n$change\n$desc\n";
}
else {
    print "no match\n";
}

Output:

DEC-12-2012
bugsbunny
5678
Modified Module
Kenosis
  • 6,196
  • 1
  • 16
  • 16
  • The OP stated that this is a search for headings in source code. Thus the rest of the regex was necessary to ensure that one is really looking at a heading, and not some other part of the source. Also, splitting every line will be an inefficient solution when looking for only a small subset of lines. A regex like the original will be far faster because it immediately rejects any line that doesn't begin with `*`. –  Feb 06 '13 at 09:04
  • @dan1111 - A regex solution here isn't necessarily *far faster* than a `split`, as benchmarking shows it's only ~6% faster: [Regex vs. Split](http://pastebin.com/CbHNd4wJ). Where my solution absolutely fails is in *not* matching the OP's string pattern (as your does), as there may be other headings that could create false positives--an issue that originally escaped me. – Kenosis Feb 06 '13 at 17:30
  • I suppose *far faster* was probably an exaggeration. –  Feb 07 '13 at 08:43
0
$string = "* JAN-02-2001   bugsbunny     1234     Created Module";
$string = "* DEC-12-2012   bugsbunny     5678     Modified Module";
if($string =~ /^\*\s+(\w\w\w-\d\d-\d\d\d\d)\s+(\w+)\s+(\d+)\s+([^Created]|Modified)\s+(\w+)/){
    print "$1\n$2\n$3\n$4\n";
}
else {
    print "no match\n";
}
Zero Piraeus
  • 56,143
  • 27
  • 150
  • 160