5

Using Ruby 2.2

I have strings like following:

  • Weekly on Tuesday and Friday
  • Weekly on Monday, Wednesday and Saturday
  • Monthly every 2 weeks on Monday

To extract the days of week from above shown strings I have written following regex:

/\b(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)\b/

When trying to use String#match instance method the match_data doesn't return all matches. For e.g. please refer the irb output shown below wherein when the string Weekly on Tuesday and Friday is matched against above shown regex the MatchData contains just Tuesday. I expected it to contain Friday too.

  2.2.1 :001 > str = "Weekly on Tuesday and Friday"
  => "Weekly on Tuesday and Friday" 
  2.2.1 :002 > regex = /\b(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)\b/
  => /\b(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)\b/ 
  2.2.1 :003 > str.match(regex)
  => #<MatchData "Tuesday" 1:"Tuesday"> 
  2.2.1 :004 > match_data = str.match(regex)
  => #<MatchData "Tuesday" 1:"Tuesday"> 
  2.2.1 :005 > match_data.captures
  => ["Tuesday"] 

Can anybody please explain me why the MatchData contains only the first matched term when I haven't used any start/end anchors in my Regex? I am sure my regex misses something but I am unable to figure out.

Note

Rubular shows correct match groups for the same regex as can be seen at http://rubular.com/r/XZmrHPkjEk

Jignesh Gohel
  • 6,236
  • 6
  • 53
  • 89
  • Did you try using `str.scan(/\b(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)\b/)` ? Is it what you are looking for? http://ruby-doc.com/docs/ProgrammingRuby/html/ref_c_string.html#String.scan. – Wiktor Stribiżew Apr 15 '15 at 22:00
  • @stribizhev Yes `str.scan(regex)` returns desired results. But I need to understand why `str.match(regex)` is not returning all matches. – Jignesh Gohel Apr 15 '15 at 22:05
  • Because it's not supposed to? Regexp#match just returns a match for the regex. Your regex only looks for a single day of the week and is matched as such. – Chris Heald Apr 15 '15 at 22:10
  • @ChrisHeald In that case can you please suggest the fix in my regex such that it repeatedly matches days of week present in the string. I tried using the repetition quantifiers in my regex but it seems like I am making some mistake in using those along with the word boundary. – Jignesh Gohel Apr 15 '15 at 22:20
  • You can't get an arbitrary number of match group results out of Regexp#match. You can get the entire substring which matches multiple days, but not individual captures of the days. You need to be using #scan for that. – Chris Heald Apr 15 '15 at 22:25

1 Answers1

5

It seems that MatchData returned by .match() method only returns the first match with all captured groups if any. I have just tested it and I only could get 1 match with .match().

See Regular-Expressions.info details:

To test if a particular regex matches (part of) a string, you can either use the =~ operator, call the regexp object's match() method, e.g.: print "success" if subject =~ /regex/ or print "success" if /regex/.match(subject).

Also, from here:

String.=~(Regexp) returns the starting position of the first match or nil if no match was found

To obtain all matches, you need to use .scan() method.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563