The reason your code isn't working is that you have a greedy ^(.*)
at the start of of the regular expression. That will take up as much of the target string as possible as long as the rest of the pattern matches, so you will find only the last occurrence of the substring. You can fix it by just changing it to a non-greedy pattern ^(.*?)
.
A few other notes on your regular expression:
There is no need to escape :
or ,
, or *
when it is inside a character class [...]
There is never a need for the quantifier {1}
as that is the effect of a pattern without a quantifier
There is no need to put \d
inside a character class [\d]
, as it works fine on its own
There is no need to enclose subpatterns in parentheses unless you need access to whatever substring matched that subpattern when the match succeeds. So, for instance ^.*
is fine without the parentheses
This modification of your code works identically to yours, but is very much more concise
while ($info1 =~ s/^.*?:([A-Z*]\d+[A-Z*]),// ) {
my $pos = $1;
...
}
But the best solution is to use a global match that finds all occurrences of a pattern within a string, and doesn't need to modify the string in the process.
This program does what you describe. It just looks for all the alphanumeric or asterisk strings that follow a colon in each record.
use strict;
use warnings;
while (<DATA>) {
my @fields = /:([A-Z0-9*]+)/g;
print "@fields\n";
}
__DATA__
"chr1-247751935-G-.:M92R,chr1-247752366-G-.:R236G,"
"chr1-247951785-G-.:G98K,"
"chr13-86597895-S-78:M34*,chr13-56891235-S-8:G87K,chr13-235689125-S-7:M389L,"
output
M92R R236G
G98K
M34* G87K M389L