cryptic perl expression

Question

I find the following statement in a perl (actually PDL) program:

/\/([\w]+)$/i;

Can someone decode this for me, an apprentice in perl programming?

Tim · Accepted Answer · 2012-09-11T00:16:03.970

Sure, I'll explain it from the inside out:

\w - matches a single character that can be used in a word (alphanumeric, plus '_')

[...] - matches a single character from within the brackets

[\w] - matches a single character that can be used in a word (kinda redundant here)

+ - matches the previous character, repeating as many times as possible, but must appear at least once.

[\w]+ - matches a group of word characters, many times over. This will find a word.

(...) - grouping. remember this set of characters for later.

([\w]+) - match a word, and remember it for later

$ - end-of-line. match something at the end of a line

([\w]+)$ - match the last word on a line, and remember it for later

\/ - a single slash character '/'. it must be escaped by backslash, because slash is special.

\/([\w]+)$ - match the last word on a line, after a slash '/', and remember the word for later. This is probably grabbing the directory/file name from a path.

/.../ - match syntax

/.../i - i means case-insensitive.

All together now:

/\/([\w]+)$/i; - match the last word on a line and remember it for later; the word must come after a slash. Basically, grab the filename from an absolute path. The case insensitive part is irrelevant, \w will already match both cases.

More details about Perl regex here: http://www.troubleshooters.com/codecorn/littperl/perlreg.htm

And as JRFerguson pointed out, YAPE::Regex::Explain is useful for tokenizing regex, and explaining the pieces.

Thank you Tim...excellent description and fits perfectly with the context of the code. Looks like I need more regex experience after 40+ years of programming! — BoomerTN, Sep 10 '12 at 18:37

score 5 · Answer 2 · edited May 23 '17 at 12:21

You will find the Yape::Regex::Explain module worth installing.

#!/usr/bin/env perl
use YAPE::Regex::Explain;
#...may need to single quote $ARGV[0] for the shell...
print YAPE::Regex::Explain->new( $ARGV[0] )->explain;

Assuming this script is named 'rexplain' do:

$ ./rexplain '/\/([\w]+)$/i'

...to obtain:

The regular expression:

(?-imsx:/\/([\w]+)$/i)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  /                        '/'
----------------------------------------------------------------------
  \/                       '/'
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [\w]+                    any character of: word characters (a-z,
                             A-Z, 0-9, _) (1 or more times (matching
                             the most amount possible))
----------------------------------------------------------------------
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  /                        '/'
----------------------------------------------------------------------
  \/                       '/'
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [\w]+                    any character of: word characters (a-z,
                             A-Z, 0-9, _) (1 or more times (matching
                             the most amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
----------------------------------------------------------------------
  /i                       '/i'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

UPDATE:

See also: https://stackoverflow.com/a/12359682/1015385 . As noted there and in the module's documentation:

There is no support for regular expression syntax added after Perl version 5.6, particularly any constructs added in 5.10.

score 2 · Answer 3 · answered Sep 10 '12 at 18:17

/\/([\w]+)$/i;

It is a regex, and if it is a complete statement, it is applied to the $_ variable, like so:

$_ =~ /\/([\w]+)$/i;

It looks for a slash \/, followed by an alphanumeric string \w+, followed by end of line $. It also captures () the alphanumeric string, which ends up in the variable $1. The /i on the end makes it case-insensitive, which has no effect in this case.

score 2 · Answer 4 · answered Sep 11 '12 at 03:14

While it doesn't help "explain" a regex, once you have a test case, Damian's new Regexp::Debugger is a cool utility to watch what actually occurs during the matching. Install it and then do rxrx at the command line to start the debugger, then type in /\/([\w]+)$/ and '/r' (for example), and finally m to start the matching. You can then step through the debugger by hitting enter repeatedly. Really cool!

score 0 · Answer 5 · answered Sep 10 '12 at 18:22

0

This is comparing $_ to a slash followed by one or more character (case insensitive) and storing it in $1

$_ value     then     $1 value 
------------------------------
"/abcdes"     |       "abcdes"
"foo/bar2"    |       "bar2"
"foobar"      |       undef      # no slash so doesn't match

answered Sep 10 '12 at 18:22

vol7ron

40,809
21
119
172

`$1` is undef unless a previous match, in this case its value is unchanged. – Toto Sep 11 '12 at 11:54

score 0 · Answer 6 · answered Sep 11 '12 at 00:24

The Online Regex Analyzer deserves a mention. Here's a link to explain what your regex means, and pasted here for the record.

Sequence: match all of the followings in order

/                                                  (slash)
                                               --+
Repeat                                           | (in GroupNumber:1)
   AnyCharIn[ WordCharacter] one or more times   |
                                               --+
EndOfLine

cryptic perl expression

6 Answers6

Sequence: match all of the followings in order