0

I have a string as follows:

--d--d-d---d--

I want to find all occurrences of 'd' in this string with their offsets.

However, doing the following only gives me back the first result:

irb(main):001:0> m = /d/.match "d--d-d---d"
=> #<MatchData "d">
irb(main):002:0> m.size
=> 1 

What am I doing wrong? I thought match will match all occurrences of the regex in the string.

mydoghasworms
  • 18,233
  • 11
  • 61
  • 95

4 Answers4

0

To get the offset, you can use a loop like this:

s = '--d--d-d---d--'
offset = 0
while md = /d/.match(s,offset)
  p md.offset(0)[1]          
  # MatchDate#offset Returns a two-element array 
  # containing the beginning and ending offsets 
  offset = md.offset(0)[1]
end
halfelf
  • 9,737
  • 13
  • 54
  • 63
  • Thanks, but I still don't understand why it does not contain all the matches, which according to the core doc (http://www.ruby-doc.org/core-2.0/MatchData.html) sounds like it should be the case, unless I am missing something. – mydoghasworms Mar 22 '13 at 11:42
  • 1
    @mydoghasworms `#match` only runs the regexp once. `MatchData` contains offsets for the whole match and for each capture group. – dbenhur Mar 22 '13 at 15:35
  • Why loop over `match` & matchdata when `index` returns the desired offsets directly? – dbenhur Mar 22 '13 at 15:59
0

The answer I am looking for is in fact on this question: How do I get the match data for all occurrences of a Ruby regular expression in a string?

Like I said, I thought the MatchData result should contain all occurrences of the match. (I got this impression from the Ruby core doc here: http://www.ruby-doc.org/core-2.0/MatchData.html).

So while I still don't understand that part completely, at least the answer above helps me to get to all the occurrences.

Community
  • 1
  • 1
mydoghasworms
  • 18,233
  • 11
  • 61
  • 95
0

As a variant:

str = '--d--d-d---d--'
str.each_char.with_index.select{|el| el[0] == "d"}.map(&:last)

Result:

[2, 5, 7, 11]

Just position of letter started from 0. If you need it to start from 1 use with_index(1), so result will be:

[3, 6, 8, 12]
Yevgeniy Anfilofyev
  • 4,827
  • 25
  • 27
0

Regexp#match only runs the pattern once. MatchData can contain multiple matches and thus multiple offsets. The first one is the entire match, the others are the contents of the capture groups within the regexp. There's nothing in MatchData resulting from multiple applications of the regexp.

String#index produces offsets directly and can be easily used to iterate through the string.

s = '--d--d-d---d--'
[].tap{ |offsets| i=-1; while i = s.index('d', i+1); offsets << i; end }
=> [2, 5, 7, 11]
dbenhur
  • 20,008
  • 4
  • 48
  • 45