10

I am working on a project which at one point gets a list of files from an ftp server. At that point it either returns an arrayref of files OR if an optional regex reference (i.e. qr), is passed it filters the list down using grep. Further if that qr has a capture group, it treats the captured section as a version number and returns instead a hashref where the keys are the versions and the values are the file names (which would have been returned as the array if no capture groups). The code looks like (simplified slightly)

sub filter_files {
  my ($files, $pattern) = @_;
  my @files = @$files;
  unless ($pattern) {
    return \@files;
  }

  @files = grep { $_ =~ $pattern } @files;
  carp "Could not find any matching files" unless @files;

  my %versions = 
    map { 
      if ($_ =~ $pattern and defined $1) { 
        ( $1 => $_ )
      } else {
        ()
      }
    } 
    @files;

  if (scalar keys %versions) {
    return \%versions;
  } else {
    return \@files;
  }
}

This implementation tries to create the hash and returns it if it succeeds. My question, is can I detect that the qr has a capture group and only attempt to create the hash if it does?

cjm
  • 61,471
  • 9
  • 126
  • 175
Joel Berger
  • 20,180
  • 5
  • 49
  • 104

3 Answers3

20

You could use something like:

sub capturing_groups{
    my $re = shift;
    "" =~ /|$re/;
    return $#+;
}

say capturing_groups qr/fo(.)b(..)/;

Output:

2
Qtax
  • 33,241
  • 9
  • 83
  • 121
  • 2
    let me see if I get it: you match an empty string against the alternation of nothing or my regex. The nothing lets it match, but it still populates @+, which then has the number of capture groups as the number of elements. Am I right? Very clever! – Joel Berger Dec 28 '11 at 16:51
  • 2
    (correction) ... which then has one more element than captures (since $+[0] is the match) but since $#+ is one less than the number of elements, it returns the correct number of matches – Joel Berger Dec 28 '11 at 17:05
  • So, only `perl` can compile `Perl`, and one might as well extend that to only `perl` should try to parse Perl Regexps? :-) Love it. – BRPocock Dec 28 '11 at 22:02
  • @BRPocock, I like the sentiment, but in truth I think that the other guy's function probably work too. It is neat that this can be accomplished in this fashion; for more complicated things, perl might not return enough information for us, so its important to know that parsers exist and work, along with their limitations. – Joel Berger Dec 28 '11 at 23:03
  • Absolutely. I do rather like the homoiconicity of this, though, and rather wish there was more introspection available like this (in any/every given language). – BRPocock Dec 28 '11 at 23:06
4

See nparen in Regexp::Parser.

use strictures;
use Carp qw(carp);
use Regexp::Parser qw();
my $parser = Regexp::Parser->new;

sub filter_files {
    my ($files, $pattern) = @_;
    my @files = @$files;
    return \@files unless $pattern;

    carp sprintf('Could not inspect regex "%s": %s (%d)',
        $pattern, $parser->errmsg, $parser->errnum)
        unless $parser->regex($pattern);

    my %versions;
    @files = map {
        if (my ($capture) = $_ =~ $pattern) {
            $parser->nparen
                ? push @{ $versions{$capture} }, $_
                : $_
        } else {
            ()
        }
    } @files;
    carp 'Could not find any matching files' unless @files;

    return (scalar keys %versions)
        ? \%versions
        : \@files;
}

Another possibility to avoid inspecting the pattern is to simply rely on the value of $capture. It will be 1 (Perl true value) in the case of a successful match without capture. You can distinguish it from the occasional capture returning 1 because that one lack the IV flag.

daxim
  • 39,270
  • 4
  • 65
  • 132
  • 1
    same as for toolic, thanks for doing the hard work on this one, but I think I will accept Qtax's answer. It seems to me that relying on Perl's own implementation of the regex engine seems more foolproof rather than parsing. Thanks though! I really did expect that answers would end up being something like this. – Joel Berger Dec 28 '11 at 17:09
3

You could use YAPE::Regex to parse the regular expression to see if there is a capture present:

use warnings;
use strict;
use YAPE::Regex;

filter_files(qr/foo.*/);
filter_files(qr/(foo).*/);

sub filter_files {
    my ($pattern) = @_;
    print "$pattern ";
    if (has_capture($pattern)) {
        print "yes capture\n";
    }
    else {
        print "no capture\n";
    }
}

sub has_capture {
    my ($pattern) = @_;
    my $cap = 0;
    my $p = YAPE::Regex->new($pattern);
    while ($p->next()) {
        if (scalar @{ $p->{CAPTURE} }) {
            $cap = 1;
            last;
        }
    }
    return $cap;
}

__END__

(?-xism:foo.*) no capture
(?-xism:(foo).*) yes capture
toolic
  • 57,801
  • 17
  • 75
  • 117