Here are two ways to count the numbers of times a given substring appears in a string (the first being my preference). Note (as confirmed by the OP) the substring 'aa'
appears twice in the string 'aaa'
, and therefore five times in:
str = "aaabbccaaaaddbab"
1. Use String#scan with a regex that contains a positive lookahead that looks for the given substring
def count_em(str, substr)
str.scan(/(?=#{substr})/).count
end
count_em(str,"aa")
#=> 5
count_em(str,"ab")
#=> 2
Note:
"aaabbccaaaaddbab".scan(/(?=aa)/)
#=> ["", "", "", "", ""]
A positive lookbehind produces the same result:
"aaabbccaaaaddbab".scan(/(?<=aa)/)
#=> ["", "", "", "", ""]
As well, String#scan
could be replaced with the form of String#gsub that takes one argument (here the same regular expression) and no block, and returns an enumerator. That form of gsub
in unusual in that has nothing to do with character replacement; it simply generates matches of the regular expression.
2. Convert given string to an array of characters, apply String#each_char then Enumerable#each_cons, then Enumerable#count
def count_em(str, substr)
subarr = substr.chars
str.each_char
.each_cons(substr.size)
.count(subarr)
end
count_em(str,"aa")
#=> 5
count_em(str,"ab")
#=> 2
We have:
subarr = "aa".chars
#=> ["a", "a"]
enum0 = "aaabbccaaaaddbab".each_char
#=> #<Enumerator: "aaabbccaaaaddbab":each_char>
We can see the elements that will generated by this enumerator by converting it to an array:
enum0.to_a
#=> ["a", "a", "a", "b", "b", "c", "c", "a", "a", "a",
# "a", "d", "d", "b", "a", "b"]
enum1 = enum0.each_cons("aa".size)
#=> #<Enumerator: #<Enumerator:
# "aaabbccaaaaddbab":each_char>:each_cons(2)>
Convert enum1
to an array to see what values the enumerator will pass on to map
:
enum1.to_a
#=> [["a", "a"], ["a", "a"], ["a", "b"], ["b", "b"], ["b", "c"],
# ["c", "c"], ["c", "a"], ["a", "a"], ["a", "a"], ["a", "a"],
# ["a", "d"], ["d", "d"], ["d", "b"], ["b", "a"],
# ["a", "b"]]
enum1.count(subarr)
#=> enum1.count(["a", "a"])
#=> 5