13

I'm trying to split a string like Presentation about "Test Driven Development" into an array like this:

[ 'Presentation',
  'about',
  '"Behavior Driven Development"' ]

I have tried CSV::parse_line(string, col_sep: ' '), but this results in

[ 'Presentation',
  'about',
  'Behavior Driven Development' ] # I'm missing the quotes here

I also tried some regexp magic, but I'm still a beginner and didn't succeed. I guess this is quite simple for a pro, so maybe someone could point me into the right direction? Thanks.

Joshua Muheim
  • 12,617
  • 9
  • 76
  • 152

3 Answers3

22

You may use the following regular expression split:

str = 'Presentation about "Test Driven Development"'
p str.split(/\s(?=(?:[^"]|"[^"]*")*$)/)
# => ["Presentation", "about", "\"Test Driven Development\""]

It splits if there is a space but only if the text following until the end contains an even number of ". Be aware that this version will only work if all your strings are properly quoted.

An alternative solution uses scan to read the parts of the string (besides spaces):

p str.scan(/(?:\w|"[^"]*")+/)
# => ["Presentation", "about", "\"Test Driven Development\""]
Howard
  • 38,639
  • 9
  • 64
  • 83
  • Thank you, works like a charm! Regexes are magic... I will try to dissect them and understand it. – Joshua Muheim Jul 19 '12 at 18:06
  • For your reference, I used this to remove empty elements and strip quotes and spaces: "a b 'c d '".split(/\s(?=(?:[^'"]|'[^']*'|"[^"]*")*$)/).select {|s| not s.empty? }.map {|s| s.gsub(/(^ +)|( +$)|(^["']+)|(["']+$)/,'')} – Keymon Jan 02 '15 at 12:11
4

Just to extend the previous answer from Howard, you can add this method:

class String
  def tokenize
    self.
      split(/\s(?=(?:[^'"]|'[^']*'|"[^"]*")*$)/).
      select {|s| not s.empty? }.
      map {|s| s.gsub(/(^ +)|( +$)|(^["']+)|(["']+$)/,'')}
  end
end

And the result:

> 'Presentation      about "Test Driven Development"  '.tokenize
=> ["Presentation", "about", "Test Driven Development"]
Keymon
  • 327
  • 3
  • 6
0

Here:

"Presentation about \"Test Driven Development\"".scan(/\s?\w+\s?|"[\w\s]*"/).map {|s| s.strip}
Linuxios
  • 34,849
  • 13
  • 91
  • 116