0

Using Ruby 2.4. I have an array of strings. I want to strip off non-breaking and breaking space from the end of each item in the array as well as replace multiple consecutive occurrences of white space with a single white space. I thought teh below was the way, but I get an error

 > words = ["1", "HUMPHRIES \t\t\t\t\t\t\t\t\t\t\t\t\t\t, \t\t\t\t\t\t\t\t\t\t\t\t\tJASON", "328", "FAIRVIEW, OR (US)", "US", "M", " 27 ", "00:27:30.00 \t\t\t\t\t\t\t\t\t\t\t \n"]

 > words.map{|word| word ? word.gsub!(/\A\p{Space}+|\p{Space}+\z/, '').gsub!(/[[:space:]]+/, ' ') : nil }
NoMethodError: undefined method `gsub!' for nil:NilClass
    from (irb):4:in `block in irb_binding'
    from (irb):4:in `map'
    from (irb):4
    from /Users/nataliab/.rvm/gems/ruby-2.4.0/gems/railties-5.0.2/lib/rails/commands/console.rb:65:in `start'
    from /Users/nataliab/.rvm/gems/ruby-2.4.0/gems/railties-5.0.2/lib/rails/commands/console_helper.rb:9:in `start'
    from /Users/nataliab/.rvm/gems/ruby-2.4.0/gems/railties-5.0.2/lib/rails/commands/commands_tasks.rb:78:in `console'
    from /Users/nataliab/.rvm/gems/ruby-2.4.0/gems/railties-5.0.2/lib/rails/commands/commands_tasks.rb:49:in `run_command!'
    from /Users/nataliab/.rvm/gems/ruby-2.4.0/gems/railties-5.0.2/lib/rails/commands.rb:18:in `<top (required)>'
    from bin/rails:4:in `require'
    from bin/rails:4:in `<main>'

How can I properly replace consecutive occurrences of white space as well as strip it off from each word in the array?

idej
  • 5,034
  • 1
  • 11
  • 14
  • Try something like this: http://stackoverflow.com/questions/41306355/how-to-replace-the-characters-in-a-string – agm1984 Apr 23 '17 at 23:21
  • Thanks but this answer doesn't address how to remove non-breaking/breaking space from the ends of each word in the array. –  Apr 23 '17 at 23:24
  • @Natalia Can we tidy up your question a little? Remove the irb command prompts `2.4.0 :003` etc; put a `#` in front of outputs and correct your typos. Thanks. – Sagar Pandya Apr 24 '17 at 01:41
  • there are alternatives to `gsub` - `tr`, `delete` http://stackoverflow.com/questions/26749065/what-is-the-difference-between-tr-and-gsub – Bohdan Apr 24 '17 at 04:56

3 Answers3

0

Do it with simple gsub not gsub!

words.map do |w|
  #respond_to?(:gsub) if you are not sure that array only from strings
  w.gsub(/(?<=[^\,\.])\s+|\A\s+/, '') if w.respond_to?(:gsub)
end

Because gsub! can return nil if don't change the string and then you try to do gsub! again with nil. That's why you get an undefined method gsub!' for nil:NilClass error.

From gsub! explanation in ruby doc:

Performs the substitutions of String#gsub in place, returning str, or nil if no substitutions were performed. If no block and no replacement is given, an enumerator is returned instead.

As mentioned @CarySwoveland in comments \s doesn't handle non-breaking spaces. To handle it you should use [[:space:]] insted of \s.

idej
  • 5,034
  • 1
  • 11
  • 14
  • I don't understand your code. When it's run `self` is `main` and `main` doesn't have a method `gsub`. Also, it doesn't remove non-breaking spaces (e.g., [Unicode'a non-break space](https://www.cs.tut.fi/~jkorpela/chars/spaces.html) `\00A0`), a requirement of the spec. – Cary Swoveland Apr 24 '17 at 21:11
  • @CarySwoveland Thanks, it was a typo with `gsub` call after the last edit. Problem was not in the regex in this question, but you are right - `\s` doesn't match non-break space, but it's easy to handle with changing `\s` on `[[:space:]]` – idej Apr 24 '17 at 23:17
0

You can use the following:

words.map { |w| w.gsub(/(?<=[^\,\.])\s+/,'') }
 #=> ["1", "HUMPHRIES, JASON", "328", "FAIRVIEW,
 #     OR(US)", "US", "M", " 27", "00:27:30.00"]
Sagar Pandya
  • 9,323
  • 2
  • 24
  • 35
0

I assume all whitespace and non-breaking spaces at the send of each string are to be removed and, of what's left, all substrings of whitespace characters and non-breaking spaces is to be replaced by one space. (Natalia, if that's not correct please let me know in a comment.)

words =
  ["1",
   "HUMPHRIES \t\t\t\, \t\t\t\t\t\t\t\t\t\t\t\t\tJASON",
   " M\u00A0    \u00A0",
   "    27 ",
   "00:27:30.00 \t\t\t\t\t\t\t\t\t\t\t \n"]

R = /
    [[:space:]]     # match a POSIX bracket expression for one character
    (?=[[:space:]]) # match a POSIX bracket expression for in a positive lookahead
    |               # or
    [[:space:]]+    # match a POSIX bracket expression one or more times
    \z              # match end of string
    /x              # free-spacing regex definition mode

words.map { |w| w.gsub(R, '').gsub(/[[:space:]]/, ' ') }
  #=> ["1", "HUMPHRIES , JASON", " M", " 27", "00:27:30.00"]

Note that the POSIX [[:space:]] includes ASCII whitespace and Unicode's non-breaking space character, \u00A0.

To see why the second gsub is needed, note that

words.map { |w| w.gsub(R, '') }
  #=> ["1", "HUMPHRIES\t,\tJASON", " M", " 27", "00:27:30.00"] 
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100