82

I'm trying to count the number of times a string appears in another string.

I know you can count the number of times a letter appears in a string:

string = "aabbccddbb"
string.count('a')
=> 2

But if I search for how many times 'aa' appears in this string, I also get two.

string.count('aa')
=> 2

I don't understand this. I put the value in quotation marks, so I'm searching for the number of times the exact string appears, not just the letters.

Flip
  • 6,233
  • 7
  • 46
  • 75
Johnson
  • 1,679
  • 1
  • 14
  • 21

3 Answers3

86

Here are two ways to count the numbers of times a given substring appears in a string (the first being my preference). Note (as confirmed by the OP) the substring 'aa' appears twice in the string 'aaa', and therefore five times in:

str = "aaabbccaaaaddbab"

1. Use String#scan with a regex that contains a positive lookahead that looks for the given substring

def count_em(str, substr)
  str.scan(/(?=#{substr})/).count
end
count_em(str,"aa")
  #=> 5
count_em(str,"ab")
  #=> 2

Note:

"aaabbccaaaaddbab".scan(/(?=aa)/)
  #=> ["", "", "", "", ""]

A positive lookbehind produces the same result:

"aaabbccaaaaddbab".scan(/(?<=aa)/)
  #=> ["", "", "", "", ""]

As well, String#scan could be replaced with the form of String#gsub that takes one argument (here the same regular expression) and no block, and returns an enumerator. That form of gsub in unusual in that has nothing to do with character replacement; it simply generates matches of the regular expression.

2. Convert given string to an array of characters, apply String#each_char then Enumerable#each_cons, then Enumerable#count

def count_em(str, substr)
  subarr = substr.chars
  str.each_char
     .each_cons(substr.size)
     .count(subarr)
end
count_em(str,"aa")
  #=> 5
count_em(str,"ab")
  #=> 2

We have:

subarr = "aa".chars
  #=> ["a", "a"]
enum0 = "aaabbccaaaaddbab".each_char
  #=> #<Enumerator: "aaabbccaaaaddbab":each_char>

We can see the elements that will generated by this enumerator by converting it to an array:

enum0.to_a
  #=> ["a", "a", "a", "b", "b", "c", "c", "a", "a", "a",
  #    "a", "d", "d", "b", "a", "b"]

enum1 = enum0.each_cons("aa".size)
  #=> #<Enumerator: #<Enumerator:
  #      "aaabbccaaaaddbab":each_char>:each_cons(2)> 

Convert enum1 to an array to see what values the enumerator will pass on to map:

enum1.to_a
  #=> [["a", "a"], ["a", "a"], ["a", "b"], ["b", "b"], ["b", "c"],
  #    ["c", "c"], ["c", "a"], ["a", "a"], ["a", "a"], ["a", "a"], 
  #    ["a", "d"], ["d", "d"], ["d", "b"], ["b", "a"],
  #    ["a", "b"]]
 
enum1.count(subarr)
  #=> enum1.count(["a", "a"])
  #=> 5
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
62

It's because the count counts characters, not instances of strings. In this case 'aa' means the same thing as 'a', it's considered a set of characters to count.

To count the number of times aa appears in the string:

string = "aabbccddbb"
string.scan(/aa/).length
# => 1
string.scan(/bb/).length
# => 2
string.scan(/ff/).length
# => 0
tadman
  • 208,517
  • 23
  • 234
  • 262
  • I see, to find the count of actual strings, you use the scan method instead of the count method. Thank you. – Johnson Sep 19 '14 at 16:34
  • 4
    Yeah [`scan`](http://www.ruby-doc.org/core-2.1.2/String.html#method-i-scan) takes a regular expression like `/aa/` or even a string like `"aa"` if you prefer and returns the matches. `length` tells you how many matches if you don't care what the matches are. – tadman Sep 19 '14 at 16:35
  • 1
    You can also use count or size instead of length – Johnson Sep 19 '14 at 16:39
  • 7
    There is no reason to use a Regexp instead of a String in this example. – Ismael Abreu Sep 19 '14 at 17:12
  • Good answer. Worked on ruby 2.1.5 ! – Nishant Kumar Dec 15 '22 at 12:24
-5

try to use string.split('a').count - 1

  • Welcome to StackOverflow. Could you possibly elaborate on your answer a bit? It would help others in the future who might have this same question if you could explain the logic behind your solution! – Jeffrey Feb 12 '20 at 21:28
  • "a".split('a').count == 0; "ba".split('a').count == 1; "bad".split('a').count == 2; – oklas Mar 12 '20 at 20:24