1
if ($key =~ /^\s*(\S.*?)\s*$/o)

I am especially confused about the meaning of (\S.*?).Any one can explain it for me?Thanks.

wuchang
  • 3,003
  • 8
  • 42
  • 66

4 Answers4

6

This should help you understand

Regular expression visualization

Regarding this specifically: (\S.*?)

  • ( begins your capture group (group 1)
  • \S is non whitespace character
  • . is any character; *? is "zero or more times" (non-greedy)
  • ) ends your capture group

In plain english, your entire regex says

  • From the beginning of the string
  • match zero or more whitespace
  • start a capture group
    • look for a non-whitespace
    • followed by any character zero or more times
  • end the capture group
  • match zero or more whitespace
  • then the end of the string

Kind of a weird regex, imho.

Community
  • 1
  • 1
Mulan
  • 129,518
  • 31
  • 228
  • 259
  • Check the image source :) – Mulan Aug 30 '13 at 06:12
  • 4
    +1 For letting us know this site http://www.debuggex.com/. Helping a great deal in learning regex. thanks. – Sid Aug 30 '13 at 06:23
  • Hey , you said "*?" is "zero or more times",but as I know , "*" is "zero or more times",so ,what's the difference between "*?" and "*"? – wuchang Aug 30 '13 at 08:01
  • @Vico_Wu `*?` is *zero or more times, as few as possible*. (`?` is *one or none*). Did the formatting of your comment get messed up? – amon Aug 30 '13 at 08:09
  • Yes ,as I know , "?" means zero or one time ,* means zero or more times, so,why *? become zero or more times? – wuchang Aug 30 '13 at 08:36
  • 1
    @Vico_Wu `*?` is just another quantifier. The quantifiers like `*` and `+` can be modified in the way *how* they match. `*?`/`+?` is the *non-greedy* version (there is also `*+`/`++` a *possesive* version). All of this is documented in the [`perlre` manpage](http://perldoc.perl.org/perlre.html#Quantifiers). – amon Aug 30 '13 at 09:00
  • Thank you very much.Ihave already refered to the perldoc about the greedy and non-greedy regex,no , I have a very clear understarding.Thank you. – wuchang Aug 31 '13 at 08:56
5

The code:

$key =~ /^\s*(\S.*?)\s*$/o

will produce a string in $1 where all leading and trailing whitespace characters (as defined by \s) are removed.

The intention of the code seems to be checking that the string does not consists only of whitespaces, and obtain a trimmed string at the same time. However, it is only true with the assumption that the string doesn't contain multiple lines, where the regex will fail to match. For example, "  somestring\nsomestring   \nmore string".

As a summary, the cases that the test rejects are:

  • Strings consisting only of whitespaces (as defined by \s)
  • Empty string.
  • Strings that matches the regex (not anchored) /\S.*\n.*\S/s 1.

1Is . in Perl equivalent to [\n]? Does it exclude anything else when s modifier is not in effect?


As for the o modifier at the end, it seems that it is an obsolete modifier in the later versions of Perl. The modifier prevents old version of Perl from recompiling the pattern unnecessarily, but the current usage is limited to several use cases. Check the perlop documentation (search for /o) for more information.

Community
  • 1
  • 1
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
  • 1
    You say it removes leading and trailing whitespace. More precisely, it stores the trimmed string in the special variable `$1`. Incidentally, you can shorten it a bit: `\S` might as well be `.`, since if the character in that position had been a `\s` it would have matched as part of the initial `\s*`. And `..*?` is equivalent to `.+?`. So you can change `(\S.*?)` to `(.+?)`. Alternatively, you could do something like `/(\S.*\S)/` (if you know that there are at least 2 non-whitespace characters). – David Knipe Aug 30 '13 at 07:43
  • 1
    @DavidKnipe: no, if you remove the \S you will make the regex match even if there are only whitespace characters. – ysth Aug 30 '13 at 07:48
  • That's true I suppose. Although it doesn't matter, assuming OP is expecting to get some non-whitespace in the string. – David Knipe Aug 30 '13 at 08:01
1

Apart of the strict meaning of this regex, (well documented by @naomik), the whole instruction:

if ($key =~ /^\s*(\S.*?)\s*$/o)

means:

If $key matches the regex, the group $1 will contains the same as $key without leading and trailing spaces.
The \o modifier (now obsolete), avoids the recompilation of the regex. You should use qr/^\s*(\S.*?)\s*$/ instead :

my $re = qr/^\s*(\S.*?)\s*$/;
if ($key =~ $re) 
Toto
  • 89,455
  • 62
  • 89
  • 125
  • The `/o` modifier is obsolete because perl will cache the compiled regex if it doesn't change dynamically (i.e. if no variables are interpolated). Therefore, neither `/o` nor `qr//` is neccessary here. – amon Aug 30 '13 at 07:43
1

A lot of info here:

http://perldoc.perl.org/perlre.html#Regular-Expressions

I also use this to check regexs before I use them - I find it very helpful (and it also gives a good explanation of what each step is doing):

http://www.regex101.com/r/xV1vO6

fugu
  • 6,417
  • 5
  • 40
  • 75