9

Say I have a huge body of text ~500 chars stored in a string, how can i loop through the string and increment a variable by 1 every time i encounter the character 'a'?

Peer Stritzinger
  • 8,232
  • 2
  • 30
  • 43
frostmage
  • 109
  • 1
  • 1
  • 3
  • 7
    Welcome to Stack Overflow! Did you try to solve this yourself already? If so, where did you get stuck exactly? I general, people don't respond too well to "please make this code for me"-requests (which this looks like), but **do** respond well to "I tried this and now I'm stuck/confused, please help"-requests ;-) – Martin Tournoij Apr 03 '16 at 02:06
  • 2
    500 chars is small, not huge. At 500 chars, it's hard to find a bad way to check the string. – Nathaniel Waisbrot Apr 03 '16 at 02:58
  • [Enumerate a string in Elixir](https://stackoverflow.com/questions/67791728/enumerate-a-string-in-elixir) – Adam Millerchip Jun 02 '21 at 06:03

2 Answers2

21

I think there are more easily understandable approaches to this that might work just fine for you. Using a regex:

Regex.scan(~r/a/, str) |> Enum.count

or dividing the string into its unicode characters and then counting on that:

str |> String.graphemes |> Enum.count(fn(c) -> c == "a" end)

These are not very efficient approaches but the performance impact should be negligible with a (relatively small!) string that is only 500 chars long.

If you need a more efficient approach, a good option is often to iterate using recursion and then counting the occurences manually. Although this approach is quite verbose, it performs much better.

defmodule Recursive do
  def count(str, <<c::utf8>>) do
    do_count(str, c, 0)
  end

  defp do_count(<<>>, _, acc) do
    acc
  end

  defp do_count(<<c::utf8, rest::binary>>, c, acc) do
    do_count(rest, c, acc + 1)
  end

  defp do_count(<<_::utf8, rest::binary>>, c, acc) do
    do_count(rest, c, acc)
  end
end

Finally, here's a benchmark using benchfella of the approaches so far. I also included @DeboraMartins' "split length" solution, which outperforms all of the above for small strings. For larger strings, the difference to the recursive approach is negligible.

# 500 Characters

split length         500000   5.90 µs/op
recursive            100000   10.63 µs/op
regex count          100000   24.35 µs/op
graphemes count       10000   118.29 µs/op


# 500.000 Characters

split length            100   11150.59 µs/op
recursive               100   12002.20 µs/op
regex count             100   25313.40 µs/op
graphemes count          10   218846.20 µs/op
Patrick Oscity
  • 53,604
  • 17
  • 144
  • 168
10

My sugestion of code is:

countSubstring = fn(_, "") -> 0
                 (str, sub) -> length(String.split(str, sub)) - 1 end

And you can call using IO.puts countSubstring.(str, "a")

Debora Martins
  • 243
  • 3
  • 9
  • 1
    This is very good, could you please explain how this works? I don't understand how it keeps track of a count for each "" in the parameter. Sorry I am not used to Elixir this is my first project in it and the syntax is not very friendly to me :S – frostmage Apr 03 '16 at 02:52
  • Although it's not the most intuitive solution, it performs extremely well and also handles multiple characters. See benchmarks in my answer. – Patrick Oscity Apr 03 '16 at 09:06
  • Take a look at https://learnxinyminutes.com/docs/elixir/ for a quick review of syntax involving anonymous functions. In a nutshell, Elixir functions (including anonymous ones) can define multiple signatures, and the version that gets called is based on pattern matching and/or gates. The 1st variant handles the case where the input substring is empty. The 2nd handles the other use-cases -- the String.split() returns a list, and the size of that list minus 1 corresponds to the occurrences of the substring. Hope that helps. – Everett Feb 02 '18 at 05:55