9

I was trying to use gsub to remove non word characters in a string in a rails app. I used the following code:

somestring.gsub(/[\W]/i, '')  #=> ""

but it is actually incorrect, it will remove letter k as well. The correct one should be:

somestring.gsub(/\W/i, '')  #=> "kkk"

But my problem is that the unit test of a rails controller which contains the above code using rspec does not work, the unit test actually passes. So I created a pretty extreme test case in rspec

it "test this gsub" do
  'kkk'.gsub(/[\W]/i, '').should == 'kkk'
end

the above test case should fail, but it actually passes. What is the problem here? Why would the test pass?

Ben
  • 171
  • 1
  • 8
  • 2
    Why would the test fail? `/[\W]/i` is a completely valid regexp for that task as far as I can see. Brackets are unnecessary in that case, but it doesn't hurt anything. – KL-7 Apr 27 '12 at 15:19
  • Did you actually try your regexps in `irb`? `"kkk".gsub(..)` it works like it should, and the result is "kkk", so the test passes. What is the result you are expecting? – Casper Apr 27 '12 at 15:20
  • 1
    @Casper Actually, when running `'kkk'.gsub(/[\W]/i, '')` I get `""`. In comparison, running `'kkk'.gsub(/\W/i, '')` returns `"kkk"`. – Andrew Marshall Apr 27 '12 at 15:21
  • Eh wot? `k` is a "word" character. And `\W` matches **non-word** characters. On my Ruby installation I get `"kkk"` when running in `irb`. – Casper Apr 27 '12 at 15:24
  • @Casper Yea I know, it doesn't make any sense. I'm running 1.9.3-p194. 1.9.2-p318 has the same behavior, but 1.8.7-p358 returns `"kkk"` as expected. – Andrew Marshall Apr 27 '12 at 15:26
  • My apologies, I was testing it on 1.8.7 where it works as expected. In 1.9 I get the same result as Andrew Marshall. – KL-7 Apr 27 '12 at 15:33
  • Yep..I just tried in 1.9, same here. Bug? – Casper Apr 27 '12 at 15:36
  • 1
    Looks like. Though, everything works fine if you remove `/i` flag. Do you really need ignore-case flag for non-word characters? – KL-7 Apr 27 '12 at 15:36
  • Alright, this is rather odd: `'jklfds'.gsub(/[\W]/i, '')` yields `"jlfd"`. This regexp seems to really confuse Ruby. – Andrew Marshall Apr 27 '12 at 15:46
  • Behaviour confirmed on 1.9.3p0 – steenslag Apr 27 '12 at 15:54
  • Forgot to mention that my ruby version is ruby-1.9.3-p125 – Ben Apr 27 '12 at 16:01
  • My question is why the unit test would pass. If this is a ruby language bug, then the unit test should not pass, it should fail. – Ben Apr 27 '12 at 16:06

1 Answers1

5

Ruby 1.9 switched to a different regular expression engine (Oniguruma), which accounts for the behavior change. This seems like a bug in it.

For your example, you can get around the issue by not specifying a case insensitive match:

irb(main):001:0> 'kkk'.gsub(/[\W]/i, '')
=> ""
irb(main):002:0> 'kkk'.gsub(/[\W]/, '')
=> "kkk"
irb(main):004:0> 'kkk'.gsub(/\W/i, '')
=> "kkk"
irb(main):003:0> 'kkk'.gsub(/\W/, '')
=> "kkk"

Update: It looks like removing the character group is another approach. It might be that negated matches like that aren't necessarily valid in a character group?

Nevir
  • 7,951
  • 4
  • 41
  • 50
  • Do you think… no it couldn't… it's taking `\W` and making it `\w` because it's case insensitive? It couldn't actually be doing that, right?? O_O – Andrew Marshall Apr 27 '12 at 15:36
  • I hope not... But you never know. This should probably be brought up on http://bugs.ruby-lang.org to confirm where the blame lies – Nevir Apr 27 '12 at 15:38
  • Bug confirmed here http://www.rubular.com/ too. You can switch between 1.8.7 and 1.9.2 and see the difference. – Casper Apr 27 '12 at 15:41
  • @AndrewMarshall, I doubt it's that stupid =) Btw, `/[\S]/i` works just fine. – KL-7 Apr 27 '12 at 15:41
  • @KL-7 Yea but you never know `;)`. And nice, that's somewhat reassuring. – Andrew Marshall Apr 27 '12 at 15:43
  • Btw just tested this on Ruby 2.0 dev and it still exists. – Andrew Marshall Apr 27 '12 at 15:50
  • 5
    Here's an already existing [Ruby issue about this](http://bugs.ruby-lang.org/issues/4044). – Andrew Marshall Apr 27 '12 at 15:52
  • Gosh, I didn't expect this question is that popular. Apparently lots of people didn't know this. Thanks for the answers! =D – Ben Apr 27 '12 at 16:00
  • Actually I still dont know why in the rspec test, the code works! Is there anything special about `gsub` in rspec test? – Ben Apr 27 '12 at 16:05
  • 1
    It looks like it's an issue when the regexps are in unicode mode - my guess is that your rails env has a different default encoding than your test environment – Nevir Apr 27 '12 at 16:12