Say I have a huge body of text ~500 chars stored in a string, how can i loop through the string and increment a variable by 1 every time i encounter the character 'a'?
-
7Welcome to Stack Overflow! Did you try to solve this yourself already? If so, where did you get stuck exactly? I general, people don't respond too well to "please make this code for me"-requests (which this looks like), but **do** respond well to "I tried this and now I'm stuck/confused, please help"-requests ;-) – Martin Tournoij Apr 03 '16 at 02:06
-
2500 chars is small, not huge. At 500 chars, it's hard to find a bad way to check the string. – Nathaniel Waisbrot Apr 03 '16 at 02:58
-
[Enumerate a string in Elixir](https://stackoverflow.com/questions/67791728/enumerate-a-string-in-elixir) – Adam Millerchip Jun 02 '21 at 06:03
2 Answers
I think there are more easily understandable approaches to this that might work just fine for you. Using a regex:
Regex.scan(~r/a/, str) |> Enum.count
or dividing the string into its unicode characters and then counting on that:
str |> String.graphemes |> Enum.count(fn(c) -> c == "a" end)
These are not very efficient approaches but the performance impact should be negligible with a (relatively small!) string that is only 500 chars long.
If you need a more efficient approach, a good option is often to iterate using recursion and then counting the occurences manually. Although this approach is quite verbose, it performs much better.
defmodule Recursive do
def count(str, <<c::utf8>>) do
do_count(str, c, 0)
end
defp do_count(<<>>, _, acc) do
acc
end
defp do_count(<<c::utf8, rest::binary>>, c, acc) do
do_count(rest, c, acc + 1)
end
defp do_count(<<_::utf8, rest::binary>>, c, acc) do
do_count(rest, c, acc)
end
end
Finally, here's a benchmark using benchfella of the approaches so far. I also included @DeboraMartins' "split length" solution, which outperforms all of the above for small strings. For larger strings, the difference to the recursive approach is negligible.
# 500 Characters
split length 500000 5.90 µs/op
recursive 100000 10.63 µs/op
regex count 100000 24.35 µs/op
graphemes count 10000 118.29 µs/op
# 500.000 Characters
split length 100 11150.59 µs/op
recursive 100 12002.20 µs/op
regex count 100 25313.40 µs/op
graphemes count 10 218846.20 µs/op

- 53,604
- 17
- 144
- 168
My sugestion of code is:
countSubstring = fn(_, "") -> 0
(str, sub) -> length(String.split(str, sub)) - 1 end
And you can call using IO.puts countSubstring.(str, "a")

- 243
- 3
- 9
-
1This is very good, could you please explain how this works? I don't understand how it keeps track of a count for each "" in the parameter. Sorry I am not used to Elixir this is my first project in it and the syntax is not very friendly to me :S – frostmage Apr 03 '16 at 02:52
-
Although it's not the most intuitive solution, it performs extremely well and also handles multiple characters. See benchmarks in my answer. – Patrick Oscity Apr 03 '16 at 09:06
-
Take a look at https://learnxinyminutes.com/docs/elixir/ for a quick review of syntax involving anonymous functions. In a nutshell, Elixir functions (including anonymous ones) can define multiple signatures, and the version that gets called is based on pattern matching and/or gates. The 1st variant handles the case where the input substring is empty. The 2nd handles the other use-cases -- the String.split() returns a list, and the size of that list minus 1 corresponds to the occurrences of the substring. Hope that helps. – Everett Feb 02 '18 at 05:55