My question is specific. I can see that the theory of huffman coding is easy to understand. However, it seems that it creates codes that usually do not align to byte boundaries. The practical method to mitigate this specific issue is not dealt in the tutorials I have come across yet.
There are two problems:
(1) Once a file is encoded, the resulting huffman code file's end of file may not align at byte boundary. How do we know that we have reached end of huffman coded data in a compressed file?
(2) Provided that a huffman table is included in the file to help decompression, how is such a table created in practice since we again encounter non alignment with byte boundaries? The symbols themself may be 8 or 16 bits. However, the huffman code can be any number of bits. Now if we include a huffman code per code, we will also have to include how many bits it is so the huffman table can be used by the decoder to create a binary tree or some other data structure to help with decompression.
Huffman and Arithmatic coding seem to be used in a lot of compression systems and thus this question keeps popping up.
I am trying to understand how this is done in JPEG and will be building an encoder in C using a Nios II soft core processor in an FPGA to save JPEG file in SD Card from a Camera.