4

I am working a school project to implement a Huffman code on text. The first part of course requires a frequency analysis on the text. Is there a better way aside from a giant switch and an array of counters to do it?

ie:

int[] counters

for(int i = 0; i <inString.length(); i++)
{
switch(inString[i])
    case 'A':
    counters[0]++;
.
.
. 

I would like to do all alpha-numeric characters and punctuation. I am using c++.

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
Maynza
  • 748
  • 5
  • 18

3 Answers3

8

Why not:

int counters[256] = {0};
for(int i = 0; i <inString.length(); i++)
    counters[inString[i]]++;
}


std::cout << "Count occurences of \'a\'" << counters['a'] << std::endl;
Alexander Gessler
  • 45,603
  • 7
  • 82
  • 122
6

You can use an array indexed by character:

int counters[256];
for (int i = 0; i < inString.length(); i++) {
    counters[(unsigned char)inString[i]]++;
}

You will also want to initialise your counters array to zero, of course.

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • And for those of us playing the optimization game at home for fun, `for (int i = inString.length()-1; i >= 0 ; i--)` instead. – Amber Feb 28 '10 at 04:16
  • 1
    @Dav:if you want to optimize, lift the call to `inString.length()` out of the loop instead. Counting backwards is more often counterproductive, simply because your cache may not expect that -- and a single cache miss will cost more than a lot of comparisons. – Jerry Coffin Feb 28 '10 at 05:45
  • It's more the fact that moving it from the conditional to the initializer results in fewer function calls to `.length()`. But yes, moving it out of the loop also works fine. – Amber Feb 28 '10 at 06:51
  • I usually write that as `for (int i = 0, imax = inString.length(); i < imax; i++)`. – Roland Illig Jun 13 '10 at 09:29
2

using a map seems completely applicable:

map<char,int> chcount;
for(int i=0; i<inString.length(); i++){
  t=inString[i];
  chcount[i]? chcount[i]++ : chcount[i]=1;
}
dagoof
  • 1,137
  • 11
  • 14
  • 1
    This is particularly true if you venture beyond the world of nationalized character sets into the big, wide world of Unicode. – Jerry Coffin Feb 28 '10 at 05:46