1
  ^[[:space:]]*@

I can't figure out what the [[:space:]]* means in the above regular expression. Please help, thanks!

Joe
  • 41,484
  • 20
  • 104
  • 125
Runner
  • 365
  • 1
  • 5
  • 21

3 Answers3

9

[:space:] is a POSIX character class which matches All whitespace characters, including line breaks in the word.

In other words [:space:] is identical to \s (since Perl 5.18[1])

http://www.regular-expressions.info/posixbrackets.html


  1. Before 5.18, the vertical tab (U+000B) wasn't included in \s.

    $ diff -u <( unichars -au '\s' ) <( unichars -au '[[:space:]]' ) \
        && echo 'no difference'
    --- /dev/fd/63  2013-05-21 22:08:03.000000000 -0400
    +++ /dev/fd/62  2013-05-21 22:08:03.000000000 -0400
    @@ -1,5 +1,6 @@
      ---- U+00009 CHARACTER TABULATION
      ---- U+0000A LINE FEED (LF)
    + ---- U+0000B LINE TABULATION
      ---- U+0000C FORM FEED (FF)
      ---- U+0000D CARRIAGE RETURN (CR)
      ---- U+00020 SPACE
    
friedo
  • 65,762
  • 16
  • 114
  • 184
Bill
  • 5,263
  • 6
  • 35
  • 50
  • `\s` also doesn't always match U+00A0, the non-breaking space. `/u` makes sure it does. `/a` makes sure it doesn't. – ikegami May 22 '13 at 02:23
4

This is a POSIX character class, in this case a Unicode-friendly way of representing "any whitespace character".

See this page, scroll down to "POSIX Character Classes".

Platinum Azure
  • 45,269
  • 12
  • 110
  • 134
  • POSIX predates Unicode by a large margin. The rationale for abstract character classes (my ad-hoc term, don't remember the official terminology) was more generally related to portability across locales and character sets. – tripleee May 22 '13 at 05:58
0

There are a number of ways of expressing things like "whitespace character", and this is one of them. The set [...] allows the inclusion of things like [:space:] to add space characters to the set.

This reads as:

^ # At the beginning of string...
[[:space:]]* # ...zero or more whitespace characters...
@ # ...followed by an at sign.
tadman
  • 208,517
  • 23
  • 234
  • 262
  • Not space characters, *whitespace* characters. Maybe that's what you meant, but accurate terminology is more important than usual in this case, because many regex beginners start out believing that `\s` and/or `[[:space:]]` *do* match only the space character (`U+0020`). – Alan Moore May 22 '13 at 04:23
  • Whitespace is a better term, adjusted accordingly. – tadman May 22 '13 at 05:14