I was going through a great article on LZW compression algorithm by Mark Nelson, and found something in the code I haven't yet encountered.
In the code, he used unordered_map
to store strings and their corresponding frequency. The declaration of the map was:
std::unordered_map<std::string, unsigned int> codes( (max_code * 11)/10 );
max_code
stores the maximum number of entries in the map, i.e. 32767. The code:
void compress( INPUT &input, OUTPUT &output, const unsigned int max_code = 32767 )
{
//code
}
I am unaware as to what parameter does an unsigned int
value associated with codes
hold. Also, could someone enlighten me as to why the max_code
value is multiplied by 11 and then divided by 10?
Here is the compress function for reference:
template<class INPUT, class OUTPUT>
void compress( INPUT &input, OUTPUT &output, const unsigned int max_code = 32767 )
{
input_symbol_stream<INPUT> in( input );
output_code_stream<OUTPUT> out( output, max_code );
std::unordered_map<std::string, unsigned int> codes( (max_code * 11)/10 );
for ( unsigned int i = 0 ; i < 256 ; i++ )
codes[std::string(1,i)] = i;
unsigned int next_code = 257;
std::string current_string;
char c;
while ( in >> c ) {
current_string = current_string + c;
if ( codes.find(current_string) == codes.end() ) {
if ( next_code <= max_code )
codes[ current_string ] = next_code++;
current_string.erase(current_string.size()-1);
out << codes[current_string];
current_string = c;
}
}
if ( current_string.size() )
out << codes[current_string];
}