0

I have this input:

AB2.HYNN.KABCDSEG.L000.G0001V00
AB2.HYNN.GABCDSEG.L000.G0005V00

I would like to remove all which finish by GXXXXVXX in the string.

When i use this code:

$result  =~ s/\.G.*V.*$//g;
print "$result \n";

The result is :

AB2.HYNN.KABCDSEG.L000
AB2.HYNN

It seems each time the regex find ".G" it removes with blank . I don't understand.

I would like to have this:

AB2.HYNN.KABCDSEG.L000
AB2.HYNN.GABCDSEG.L000

How i can do this in regex ?

tchrist
  • 78,834
  • 30
  • 123
  • 180
Patrick
  • 47
  • 8

2 Answers2

0
$result =~ s/\.G\d+V\d+//g;

Works on given input.

Artyom V. Kireev
  • 588
  • 4
  • 12
0

Update:

After talking in the comments, the final solution was:

s/\.G\w+V\w+$//;

In your regex:

s/\.G.*V.*$//g;

those .* are greedy and will match as much as possible. The only requirement you have is that there must be a V after the .G somewhere, so it will truncate the string from the first .G it finds, as long as it is followed by a V. There is no need for the /g modifier here, because any match that occurs will delete the rest of the string. Unless you have newlines, because . does not match newlines without the /s modifier.

TLP
  • 66,756
  • 10
  • 92
  • 149
  • Thanks, it works. But it works only if there was number between G and V and after V. How the regex can work if there is no only numbers ? – Patrick May 04 '12 at 17:57
  • @Patrick Then you'll have to be more specific in your requirements. You can use `\.G...V..$` to match 3 and 2 wildcard characters. You can use `.G.{0,3}V{0,2}$` to match 0 to 3 and 0 to 2 wildcard characters. Since you don't mention what kind of characters can be there, I cannot tell you how to match them. – TLP May 04 '12 at 18:16
  • For example this : AB2.HYNN.GABCDVEG.L000.GA1CDV6I in the expression GXXXXVXX, XX can be numeric or alphabetic ? – Patrick May 04 '12 at 18:19
  • I suggest leading with the thing that works and explaining the broken one after. :) – brian d foy May 05 '12 at 05:43