Do it with a regular expression with the /g
flag and the \G
anchor, but in scalar context. This maintains the position in the string right after the last pattern match (or the beginning for the first one). You can walk along the string this way. Get the length, skip over the colon, and then use substr to pick up the right number of characters. You can actually assign to pos
, so update it for the characters you just extracted. redo
that until you have no more matches:
use v5.10.1;
LINE: while( my $line = <DATA> ) {
chomp( $line );
{
say $line;
next LINE unless $line =~ m/\G(\d+):/g; # scalar /g!
say "\t1. pos is ", pos($line);
my( $length, $string ) = ( $1, substr $line, pos($line), $1 );
pos($line) += $length;
say "\t2. pos is ", pos($line);
print "\tFound length $length with [$string]\n";
redo;
}
}
__END__
4:spam6:Roscoe
6:Buster10:green eggs
4:abcd5:123:44:Mimi
Notice the edge case in the last input line. That 3:
is part of the string, not a new record. My output is:
4:spam6:Roscoe
1. pos is 2
2. pos is 6
Found length 4 with [spam]
4:spam6:Roscoe
1. pos is 8
2. pos is 14
Found length 6 with [Roscoe]
4:spam6:Roscoe
6:Buster10:green eggs
1. pos is 2
2. pos is 8
Found length 6 with [Buster]
6:Buster10:green eggs
1. pos is 11
2. pos is 21
Found length 10 with [green eggs]
6:Buster10:green eggs
4:abcd5:123:44:Mimi
1. pos is 2
2. pos is 6
Found length 4 with [abcd]
4:abcd5:123:44:Mimi
1. pos is 8
2. pos is 13
Found length 5 with [123:4]
4:abcd5:123:44:Mimi
1. pos is 15
2. pos is 19
Found length 4 with [Mimi]
4:abcd5:123:44:Mimi
I figured there might be a module for this, and there is: Bencode. It does what I did. That means I did a lot of work for nothing. Always look at CPAN first. Even if you don't use the module, you can look at their solution :)