-1

I'm trying to dynamically catch regex matching in Perl. I've known that eval will help me do this but I may be doing something wrong.

Code:

use strict;
use warnings;

my %testHash = (
    '(\d+)\/(\d+)\/(\d+)'   =>  '$1$2$3'
);

my $str = '1/12/2016';

foreach my $pattern (keys (%testHash)) {
    my $value = $testHash{$pattern};
    my $result;

    eval {
        local $_ = $str;
        /$pattern/;
        print "\$1 - $1\n";
        print "\$2 - $2\n";
        print "\$3 - $3\n";
        eval { print "$value\n"; }
    }
}

Is it also possible to store captured regex patterns in an array?

criz
  • 273
  • 1
  • 11

5 Answers5

4

I believe what you really want is a dynamic version of the following:

say $str =~ s/(\d+)\/(\d+)\/(\d+)/$1$2$3/gr;

String::Substitution provides what we need to achieve that.

use String::Substitution qw( gsub_copy );

for my $pattern (keys(%testHash)) {
   my $replacement = $testHash{$pattern};
   say gsub_copy($str, $pattern, $replacement);
}

Note that $replacement can also be a callback. This permits far more complicated substitutions. For example, if you wanted to convert 1/12/2016 into 2016-01-12, you could use the following:

'(\d+)/(\d+)/(\d+)' => sub { sprintf "%d-%02d-%02d", @_[3,1,2] },

To answer your actual question:

use String::Substitution qw( interpolate_match_vars last_match_vars );

for my $pattern (keys(%testHash)) {
   my $template = $testHash{$pattern};

   $str =~ $pattern   # Or /$pattern/ if you prefer
      or die("No match!\n");

   say interpolate_match_vars($template, last_match_vars());
}
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • I don't think I understand what you mean. For a match without a capture group I get `@matches = ( 1 )`, which is probably because the match was successful so it returns 1, and `@- = (0)`. Both have the same number of elements, just different content in their single element. – simbabque Dec 01 '16 at 15:15
  • I didn't originally notice that `$testHash{$pattern}` was a template into which the captures needed to be interpolated. Fixed my answer. – ikegami Dec 01 '16 at 16:00
3

I am not completely sure what you want to do here, but I don't think your program does what you think it does.

You are useing eval with a BLOCK of code. That's like a try block. If it dies inside of that eval block, it will catch that error. It will not run your string like it was code. You need a string eval for that.

Instead of explaining that, here's an alternative.

This program uses sprintf and numbers the parameters. The %1$s syntax in the pattern says _take the first argument (1$) and format it as a string (%s). You don't need to localize or assign to $_ to do a match. The =~ operator does that on other variables for you. I also use qr{} to create a quoted regular expression (essentially a variable containing a precompiled pattern) that I can use directly. Because of the {} as delimiter, I don't need to escape the slashes.

use strict;
use warnings;
use feature 'say'; # like print ..., "\n"

my %testHash = (
    qr{(\d+)/(\d+)/(\d+)}         => '%1$s.%2$s.%3$s',
    qr{(\d+)/(\d+)/(\d+) nomatch} => '%1$s.%2$s.%3$s',
    qr{(\d+)/(\d+)/(\d\d\d\d)}    => '%3$4d-%2$02d-%1$02d',
    qr{\d}                        => '%s', # no capture group
);

my $str = '1/12/2016';

foreach my $pattern ( keys %testHash ) {
    my @captures = ( $str =~ $pattern );

    say "pattern: $pattern";

    if ($#+ == 0) {
        say "  no capture groups";
        next;
    }

    unless (@captures) {
        say "  no match";
        next;
    }

    # debug-output
    for my $i ( 1 .. $#- ) {
        say sprintf "  \$%d - %s", $i, $captures[ $i - 1 ];
    }

    say sprintf $testHash{$pattern}, @captures;
}

I included four examples:

  • The first pattern is the one you had. It uses %1$s and so on as explained above.
  • The second one does not match. We check the number of elements in @captured by looking at it in scalar context.
  • The third one shows that you can also reorder the result, or even use the sprintf formatting.
  • The last one has no capture group. We check by looking at the index of the last element ($# as the sigil for arrays that usually have an @ sigil) in @+, which holds the offsets of the ends of the last successful submatches in the currently active dynamic scope. The first element is the end of the overall match, so if this only has one element, we don't have capture groups.

The output for me is this:

pattern: (?^:(\d+)/(\d+)/(\d\d\d\d))
  $1 - 1
  $2 - 12
  $3 - 2016
2016-12-01
pattern: (?^:(\d+)/(\d+)/(\d+) nomatch)
  no match
pattern: (?^:\d)
  no capture groups
pattern: (?^:(\d+)/(\d+)/(\d+))
  $1 - 1
  $2 - 12
  $3 - 2016
1.12.2016

Note that the order in the output is mixed up. That's because hashes are not ordered in Perl, and if you iterate over the keys in a hash without sort the order is random.

simbabque
  • 53,749
  • 8
  • 73
  • 136
0

Apologies! I realized both my question and sample code were both vague. But after reading your suggestions I came of with the following code. I haven't optimized this code yet and there is a limit to the replacement.

foreach my $key (keys %testHash) {

    if ( $str =~ $key ) {
        my @matchArr = ($str =~ $key); # Capture all matches

        # Search and replace (limited from $1 to $9)
        for ( my $i = 0; $i < @matchArr; $i++ ) {
            my $num = $i+1;
            $testHash{$key} =~ s/\$$num/$matchArr[$i]/;
        }

        $result = $testHash{$key};

        last;
    }
}

print "$result\n";
criz
  • 273
  • 1
  • 11
-1

Evaluing the regexp in list context returns the matches. so in your example:

use Data::Dumper; # so we can see the result
foreach my $pattern (keys (%testHash)) {
    my @a = ($str =~/$pattern/);
    print Dumper(\@a);
}

would do the job.

HTH Georg

ikegami
  • 367,544
  • 15
  • 269
  • 518
Georg Mavridis
  • 2,312
  • 1
  • 15
  • 23
  • You get slightly odd results when the expression has no captures. See my answer. – ikegami Dec 01 '16 at 15:00
  • 1
    *list context*, not array. – Sinan Ünür Dec 01 '16 at 15:09
  • Hm.. I'l get an empty array if nothing matches and an exception for a bad pattern. Thats what i would have expected. – Georg Mavridis Dec 01 '16 at 15:10
  • Now try with a successful match with no captures (`'a' =~ /a/`). You'll get `[1]` instead of `[]`. – ikegami Dec 01 '16 at 15:12
  • @SinanÜnür Yes i know. But the function we all hate using is called wantarray. So at least in my area we usually talk about the array context (which AFAIK doesnt exist). But the change to list context from ikegami is certainly ok with me :) – Georg Mavridis Dec 01 '16 at 15:21
  • I'm not saying it's wrong, though I just noticed this just answers half the question. It also asks to process a string (e.g. `$1$2$3`) for capture interpolations. – ikegami Dec 01 '16 at 15:37
-2

Is it also possible to store captured regex patterns in an array?

Of course it is possible to store captured substrings in an array:

#!/usr/bin/env perl

use strict;
use warnings;

my @patterns = map qr{$_}, qw{
    (\d+)/(\d+)/(\d+)
};

my $str = '1/12/2016';

foreach my $pattern ( @patterns ) {
    my @captured = ($str =~ $pattern)
        or next;
    print "'$_'\n" for @captured;
}

Output:

'1'
'12'
'2016'

I do not quite understand what you are trying to do with combinations of local, eval EXPR and eval BLOCK in your code and the purpose of the following hash:

my %testHash = (
    '(\d+)\/(\d+)\/(\d+)'   =>  '$1$2$3'
);

If you are trying to codify that this pattern should result in three captures, you can do that like this:

my @tests = (
    {
        pattern => qr{(\d+)/(\d+)/(\d+)},
        ncaptures => 3,
    }
);

my $str = '1/12/2016';

foreach my $test ( @tests ) {
    my @captured = ($str =~ $test->{pattern})
        or next;
    unless (@captured == $test->{ncaptures}) {
        # handle failure
    }
}

See this answer to find out how you can automate counting the number of capture groups in a pattern. Using the technique in that answer:

#!/usr/bin/env perl

use strict;
use warnings;

use Test::More;

my @tests = map +{ pattern => qr{$_}, ncaptures => number_of_capturing_groups($_) }, qw(
    (\d+)/(\d+)/(\d+)
);

my $str = '1/12/2016';

foreach my $test ( @tests ) {
    my @captured = ($str =~ $test->{pattern});
    ok @captured == $test->{ncaptures};
}

done_testing;

sub number_of_capturing_groups {
    "" =~ /|$_[0]/;
    return $#+;
}

Output:

ok 1
1..1
Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339