I'm trying to implement a Huffman compressor in С++.
In brief I have 5 classes:
HuffmanTree
- represents a tree structureTreeNode
- represents a tree structureHuffmanArchiver
- compress/decompress etc.BitStringWrite
- writing bits.BitStringRead
- reading bits.
(The full implementation is here: headers and cpp's)
I can build a code table and encode a binary file, but I have some questions about reading/writing and about a decoding phase.
When I do an encoding phase, first of all I'm saving Huffman tree in my new file like the following:
void Archiver::encodeTree(BitStringWrite& bw, TreeNode* node){
if (node -> isLeaf()) {
bw.writeBit(1);
char symb = node->getChar();
bw.getStream().write(&symb, sizeof(symb));
}
else {
bw.writeBit(0);
encodeTree(bw, node->getLeftTree());
encodeTree(bw, node->getRightTree());
}
}
bw
here is an instance of the class BitStringWrite
, which is implemented like this:
BitStringWrite::BitStringWrite(std::ostream &_out_f) : _byte(0), _pos(0), _out_f(_out_f) {}
void BitStringWrite::writeBit(bool bit) {
if (_pos == 8)
flush();
if (bit == 1) {
_byte |= (1 << (7 - _pos));
}
_pos++;
}
void BitStringWrite::writeByte(char b){
for(int i = 0; i < 8; i++)
this -> writeBit((b >> i) & 1); //?????
}
void BitStringWrite::flush() {
if (_pos != 0) {
_out_f.write(&_byte, sizeof(char));
_pos = 0;
_byte = 0;
}
}
std::ostream& BitStringWrite::getStream(){
return _out_f;
}
I'm not sure in my writeByte
implementation, but the main question here is why may I want to implement a writeByte
function, if I already have istream::write
?
For example
>cat test.in aaaabc
The buildTable
function will produce: a = 1
, b = 010
, c = 00
and = 011
(it seems like the last symbol is just a \n
).
xxd -b test.out
00000000: 01100011 01100010 00001010 01100001 00101111 11101000 cb.a/.
00000006: 01101100
Note, that an encoded message starts from the last bit of the fifth byte. The first five(almost) bytes are representing a structure tree.
Ok, it seems like the encoding phase is working. Let's now proceed to the decoding phase.
The main function for decoding phase is decompress
. It invokes the decodeTree
function to decode the Huffman tree, then generates a code table based on this tree and then decodes the text.
The function decodeTree
doesn't work properly:
TreeNode* Archiver::decodeTree(BitStringRead& br, TreeNode* cur){
if (br.readBit()) {
return new TreeNode(br.readByte(), 0, false, NULL, NULL);
}
else {
TreeNode* left = decodeTree(br, cur-> getLeftTree());
TreeNode* right = decodeTree(br, cur-> getRightTree());
decodeTree(br, cur-> getRightTree());
return new TreeNode(0, 0, false, left, right);
}
}
I think the main reason is because it can't properly read a tree structure, using br
, an instance of a class BitStringRead
.
Look how it's implemented inside:
BitStringRead::BitStringRead(std::istream &_in_f) : _pos(8), _in_f(_in_f) {}
bool BitStringRead::readBit() {
if (_pos == 8) {
_in_f.read(&_byte, sizeof(char));
_pos = 0;
}
return (_byte >> _pos++) & (char)1;
}
char BitStringRead::readByte() {
char sym = (char)0;
for (int i = 0; i < 8; i++){
sym |= ((1 & readBit()) << (i));
}
return sym;
}
Assume, we are in the beginning of a file and I have a byte 0001 0110
. I invoke the readBit
function for the first time. It reads the first 8
bits. Then I invoke it 3 more times, it does not read anything, but just returns the value of these bits. The first 1
in the string denotes the leaf node and I know that after leaf node there is a symbol, so I read it.
I think it starts reading from the ninth bit, not from the fourth, because of readBit
implementation.