5

Im doing some screen scraping and im getting back a string that appears to end with whitespace but neither string.strip or strip.gsub(/\s/u, '') removes the character.

Im guessing it's a character encoding issue. Any suggestions?

Sam
  • 6,240
  • 4
  • 42
  • 53

3 Answers3

12

I think, there are a lot of "space characters". You can use something like this:

my_string.gsub("\302\240", ' ').strip
taro
  • 5,772
  • 2
  • 30
  • 34
  • 1
    `my_string.tr("\302\240", ' ').strip` should be a bit faster – lulalala Jul 11 '14 at 06:50
  • 2
    Worth noting -- http://stackoverflow.com/questions/2588942/convert-non-breaking-spaces-to-spaces-in-ruby -- "Use `/[[:space:]]/` to match all whitespace, including Unicode whitespace like non-breaking spaces. This is unlike `/\s/`, which matches only ASCII whitespace." – DreadPirateShawn Aug 10 '14 at 05:45
5

You can try this: my_string.gsub(/\A[[:space:]]+|[[:space:]]+\z/, '')

This should remove all space characters from the beginning and the end of string, including all possible unicode space variations.

Pavel Pravosud
  • 619
  • 6
  • 6
2

Figure out the character code of the last character (str[-1].ord) and explicitly search and destroy it. Rinse/repeat if there exist more unwanted characters after that. After doing this, report back here what the invisible character was. (Perhaps it's only invisible because the font you are using does not have that glyph?)

Phrogz
  • 296,393
  • 112
  • 651
  • 745