If your Huffman coder returns an array of 1s and 0s representing the bits that should and should not be set in the output, you can shift these bits onto an unsigned char
. Every eight shifts, you start writing to the next character, ultimately outputting an array of unsigned char
. The number of these compressed characters that you will output is equal to the number of bits divided by eight, rounded up to the nearest natural number.
In C, this is a relatively simple function, consisting of a left shift (<<
) and a bitwise OR (|
). Here is the function, with an example to make it runnable. To see it with more extensive comments, please refer to this GitHub gist.
#include <stdlib.h>
#include <stdio.h>
#define BYTE_SIZE 8
size_t compress_code(const int *code, const size_t code_length, unsigned char **compressed)
{
if (code == NULL || code_length == 0 || compressed == NULL) {
return 0;
}
size_t compressed_length = (code_length + BYTE_SIZE - 1) / BYTE_SIZE;
*compressed = calloc(compressed_length, sizeof(char));
for (size_t char_counter = 0, i = 0; char_counter < compressed_length && i < code_length; ++i) {
if (i > 0 && (i % BYTE_SIZE) == 0) {
++char_counter;
}
// Shift the last bit to be set left by one
(*compressed)[char_counter] <<= 1;
// Put the next bit onto the end of the unsigned char
(*compressed)[char_counter] |= (code[i] & 1);
}
// Pad the remaining space with 0s on the right-hand-side
(*compressed)[compressed_length - 1] <<= compressed_length * BYTE_SIZE - code_length;
return compressed_length;
}
int main(void)
{
const int code[] = { 0, 1, 0, 0, 0, 0, 0, 1, // 65: A
0, 1, 0, 0, 0, 0, 1, 0 }; // 66: B
const size_t code_length = 16;
unsigned char *compressed = NULL;
size_t compressed_length = compress_code(code, code_length, &compressed);
for (size_t i = 0; i < compressed_length; ++i) {
printf("%c\n", compressed[i]);
}
return 0;
}
You can then just write the characters in the array to a file, or even copy the array's memory directly to a file, to write the compressed output.
Reading the compressed characters into bits, which will allow you to traverse your Huffman tree for decoding, is done with right shifts (>>
) and checking the rightmost bit with bitwise AND (&
).