You can use a comprehension with a bitstring generator and the reduce
option to count the codepoints without building up the intermediate list.
for <<_::utf8 <- string>>, reduce: 0, do: (count -> count + 1)
Example:
iex> string = "♂️"
iex> for <<_::utf8 <- string>>, reduce: 0, do: (count -> count + 1)
5
iex> string |> String.codepoints |> length
5
iex> String.length(string)
1
This has the added bonus that it also works with UTF-16 and UTF-32 strings, if you replace utf8
with utf16
or utf32
:
iex> utf8_string = "I'm going to be UTF-16!"
"I'm going to be UTF-16!"
iex> utf16_string = :unicode.characters_to_binary(utf8_string, :utf8, :utf16)
<<0, 73, 0, 39, 0, 109, 0, 32, 0, 103, 0, 111, 0, 105, 0, 110, 0, 103, 0, 32, 0,
116, 0, 111, 0, 32, 0, 98, 0, 101, 0, 32, 0, 85, 0, 84, 0, 70, 0, 45, 0, 49,
0, 54, 0, 33>>
iex> for <<_::utf8 <- utf8_string>>, reduce: 0, do: (count -> count + 1)
23
iex> for <<_::utf16 <- utf16_string>>, reduce: 0, do: (count -> count + 1)
23