3

My regex matches the last set of alpha characters in the line, regardless of what I do. I want it to match only the first occurrence.

I have tried using the non-greedy operator, but it stubbornly matches the right-most set of alpha characters, in this case giving $1 the value "Trig", which isn't what I want. I want $1 to be "02.04.07.06 Geerite".

Code

elsif ($line =~ /\s(\d{2}\.\d{2}\.\d{2}\.\d{2}\s[[:alpha:]]*?)/)
{
    print OUTPUT "NT5 " . $1 . " | | \n";
}

Source

02.04.07.06 Geerite Cu8S5 R 3m, R 3m, or R 32 Trig

Output

NT2 32 Trig | |

So in other words, I want this output:

NT2 02.04.07.06 Geerite | |

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
  • Your output is prefixed with `NT2` not the `NT5` in your code sample, are you sure that this is the regex that is actually matching? – a'r Dec 08 '11 at 14:54

4 Answers4

1

If I change your code to

$line="     02.04.07.06 Geerite Cu8S5 R 3m, R 3m, or R 32 Trig ";
if ($line =~ /\s(\d{2}\.\d{2}\.\d{2}\.\d{2}\s[[:alpha:]]*?)/) { print "NT5 ".$1." | | \n"; }

I get this output:

NT5 02.04.07.06  | | 

Making the * non-greedy, the word Geerite is included in the output.

Your observed output probably comes from a different branch of the if-elsif-else tree.

choroba
  • 231,213
  • 25
  • 204
  • 289
1

This should work for you:

perl -e '$_ = "02.04.07.06 Geerite Cu8S5 R 3m, R 3m, or R 32 Trig"; print "$1\n" if /(\d\d\.\d\d\.\d\d\.\d\d \w+)/'

prints:

02.04.07.06 Geerite

The regex on its own:

/(\d\d\.\d\d\.\d\d\.\d\d \w+)/
pgl
  • 7,551
  • 2
  • 23
  • 31
1

Make [[:alpha:]] greedy:

$line = '   02.04.07.06 Geerite Cu8S5 R 3m, R 3m, or R 32 Trig';
if ($line =~ /\s(\d{2}\.\d{2}\.\d{2}\.\d{2}\s[[:alpha:]]*)/) {
    print OUTPUT "NT5 " . $1 . " | | \n";
}

Output

NT5 02.04.07.06 Geerite | |
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Toto
  • 89,455
  • 62
  • 89
  • 125
1

Your regex can't match " 32 Trig". There must be some other problem.

If I add a space at the beginning of your example string and remove the ungreedy ? after the last quantifier, it will produce the output you want.

$line =~ /\s(\d{2}\.\d{2}\.\d{2}\.\d{2}\s[[:alpha:]]*)/

The [[:alpha:]]*? will match as less as possible, so because there is no more pattern following, it will match 0 characters.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
stema
  • 90,351
  • 20
  • 107
  • 135