13

I checked the documentation, and cannot find what [\w-] means. Can anyone tell me what [\w-] means in Ruby?

mu is too short
  • 426,620
  • 70
  • 833
  • 800
Adam Lee
  • 24,710
  • 51
  • 156
  • 236

2 Answers2

32

The square brackets [] denote a character class. A character class will match any of the things inside it.

\w is a special class called "word characters". It is shorthand for [a-zA-Z0-9_], so it will match:

  • a-z (all lowercase letters)
  • A-Z (all uppercase letters)
  • 0-9 (all digits)
  • _ (an underscore)

The class you are asking about, [\w-], is a class consisting of \w and -. So it will match the above list, plus hyphens (-).

Exactly as written, [\w-], this regex would match a single character, as long as it's in the above list, or is a dash.

If you were to add a quantifier to the end, e.g. [\w-]* or [\w-]+, then it would match any of these strings:

fooBar9
foo-Bar9
foo-Bar-9
-foo-Bar---9abc__34ab12d

And it would partially match these:

foo,Bar9                    # match 'foo' - the ',' stops the match
-foo-Bar---9*bc__34ab12d    # match '-foo-Bar---9', the '*' stops the match
Dan Lowe
  • 51,713
  • 20
  • 123
  • 112
1
\w  Any word character (letter, number, underscore)

Here is what I think it is doing : Go to Rubular and try it as follow:

regex_1 /\w-/

String : f-oo 

regext_1 will only match f- and will stop right at - ignoring any \w .. the rest of the string oo

Whereas :

regex_2 /[\w-]/

string : f-oo

regex_2 will match the entire string plus the special char - .. f-oo

.. Also , tested the case of a string being like f-1oo , and the second regex stopped the match at f- Hence, - is followed by a \d

==========

I belive the whole point of [] is to continue matching before and after - . Here are some variations I tried from irb.

irb(main):004:0> "blah-blah".scan(/\w-/)  
=> ["h-"]
irb(main):005:0> "blah-blah".scan(/[\w-]/)  
=> ["b", "l", "a", "h", "-", "b", "l", "a", "h"]
irb(main):006:0> "blah-blah".scan(/\w-\w/)  
=> ["h-b"]
irb(main):007:0> "blah-blah".scan(/\w-\w*$/)  
=> ["h-blah"]
irb(main):008:0> "blah-blah".scan(/\w*-\w*$/)  
=> ["blah-blah"]
z atef
  • 7,138
  • 3
  • 55
  • 50