1

I have a string and I want to remove all non-word characters and whitespace from it. So I thought Regular expressions would be what I need for that.

My Regex looks like that (I defined it in the string class as a method):

/[\w&&\S]+/.match(self.downcase)

when I run this expression in Rubular with the test string "hello ..a.sdf asdf..," it highlioghts all the stuff I need ("hellloasdfasdf") but when I do the same in irb I only get "hello".

Has anyone any ideas about why that is?

Marek Lipka
  • 50,622
  • 7
  • 87
  • 91
tomet
  • 2,416
  • 6
  • 30
  • 45

2 Answers2

3

Because you use match, with returns one matching element. If you use scan instead, all should work properly:

string = "hello ..a.sdf asdf..,"
string.downcase.scan(/[\w&&\S]+/)
# => ["hello", "a", "sdf", "asdf"]
Marek Lipka
  • 50,622
  • 7
  • 87
  • 91
1

\w means [a-zA-Z0-9_]

\S means any non-whitespace character [a-zA-Z_-0-9!@#$%^&*\(\)\\{}?><....etc]

so using a \w and \S condition is ambiguous.

Its like saying What is an intersection of India and Asia. Obviously its going to be India. So I will suggest you to use \w+.

and you can use scan to get all matches as mentioned in the second answer :

string = "hello ..a.sdf asdf..,"
string.scan(/\w+/)
aelor
  • 10,892
  • 3
  • 32
  • 48