To reduce the character to 5 bits, you can use either ch
& 0x1F
or ch - 'A'
; neither will work with EBCDIC, but that's
likely not an issue. (If it is: a table lookup in a string of
all of the capital letters, returning the index, can be used.)
After that, it gets complicated. The simplest solution is to
define a bit array, something like:
class BitArray
{
std::vector<unsigned char> myData;
int byteIndex( int index ) { return index / 8; }
unsigned char bitMask( int index ) { return 1 << (index % 8); }
int byteCount( int bitCount )
{
return byteIndex( bitCount )
+ (bitIndex( bitCount) != 0 ? 1 : 0);
}
public:
BitArray( int size ) : myData( byteCount( size ) ) {}
void set( index )
{
myData[byteIndex( index )] |= bitMask( index );
}
void reset( index )
{
myData[byteIndex( index )] &= ~bitMask( index );
}
bool test( index )
{
return (myData[byteIndex( index ) & bitMask( index )) != 0;
}
};
(You'll need more to extract the data, but I'm not sure in what
format you need it.)
You then loop over your string:
BitArray results( 5 * s.size() );
for ( int index = 0; index != s.size(); ++ index ) {
for ( int pos = 0; pos != 5; ++ pos ) {
results.set( 5 * index + pos );
}
}
This will work without problems. When I tried using it (or
rather the equivalent) in the distant past (for Huffman
encoding, in C, since this was in the 1980's), it was also way
too slow. If your strings are fairly short, today, it may be
sufficient. Otherwise, you'll need a more complicated
algorithm, which keeps track of how many bits are already used
in the last byte, and does the appropriate shifts and masks to
insert as many bits as possible in one go: at most two shift and
or operations per insertion, rather than 5 as is the case here.
This is what I ended up using. (But I don't have the code
anymore, so I can't easily post an example.)