1

Let's say I have a string string str;, and it has any number of letters in it, and I want to count how many of each letter is in the string. For example, the word "Example" has 2 'e', 1 'x', 1 'a', 1 'm', 1 'p', and 1 'l'. Is there a more efficient way of checking for each of these letters than this?

for (int i = 0; i < str.length(); i++)
{
    if (str.at(i) == 'a')
    {
        //variable which keeps track of a ++
    }...
    //25 more of that for each other letter
}

It feels like there has to be a more efficient way of doing this, but I have no idea how. Please enlighten me.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Skullruss
  • 63
  • 8
  • 1
    Use array the size of the ascii table and then you can easily count every character. array[character]++ – Wahalez Jul 30 '21 at 14:01
  • 2
    Or rather `array[(unsigned char)character]`, since `char`s may or may not be signed. – HolyBlackCat Jul 30 '21 at 14:02
  • 2
    On a side note: using `at()` is overkill in this example, since the `for` loop ensures that it won't go out of bounds of the string, so no need to bounds check each character. Use `str[i]` instead of `str.at(i)` – Remy Lebeau Jul 30 '21 at 16:22
  • @Wahalez -- close, but use the number of possible values that `char` can hold, i.e., `1u << CHAR_BIT`. The character encoding doesn't matter during counting. – Pete Becker Jul 30 '21 at 18:19

2 Answers2

4

You could use a std::map for example:

#include <map>

std::map<char, std::size_t> mCount{};
for (auto ch : str)
{
   ++mCount[ch];
}

Using a std::array (which has the advantage of the data being contiguous in memory, thereby improving cache performance) you can write:

#include <array>
#include <limits>

constexpr auto nNumChars = static_cast<std::size_t>(std::numeric_limits<unsigned char>::max()) + 1;
std::array<std::size_t, nNumChars> arCounts{};

for (auto ch : str) {
   ++arCounts[static_cast<unsigned char>(ch)];
}
Matthias Grün
  • 1,466
  • 1
  • 7
  • 12
1

I especially like multisets for this.

Initialization from string is straight away.

Walking the set is not so clean.

https://godbolt.org/z/soccz8Tnn

#include <iostream>
#include <set>
#include <string>

int main()
{
    std::string str{"The quick fox jumped over the green fence"};

    // Easy initialization from string
    std::multiset<char> st(cbegin(str), cend(str));

    // Printing the count for each different key is a bit more difficult
    for (auto it{cbegin(st)}; it != cend(st); it = st.upper_bound(*it))
    {
        std::cout << "st[" << *it << "] = " << st.count(*it) << "\n";
    }
}
rturrado
  • 7,699
  • 6
  • 42
  • 62
  • 1
    Doesn't it perform a heap allocation for each character? (any character, not only unique ones) :( – HolyBlackCat Jul 30 '21 at 15:04
  • 1
    @HolyBlackCat I'm not sure, but it could easily be that way, yes. I understood OP's reference to efficiency as avoiding writing 26 `if` blocks; and, in that sense, `multiset` is hardly beatable. But I agree that an `array` solution is not much more complicated, and indeed more efficient (for insertions/lookup/memory usage). – rturrado Jul 30 '21 at 15:24