I am working on this Huffman Encoding project. But I am stuck on how to use this code table to output the Huffman codes into "bin" file using java? The list in the red box is "String" of a list of differently long, comma-separated groups of zeros and ones.
-
Welcome to Stack Overflow. Please take the [tour] to learn how Stack Overflow works and read [ask] on how to improve the quality of your question. Then check the [help/on-topic] to see what questions you can ask. Please see: [How do I ask and answer homework questions?](https://meta.stackoverflow.com/q/334822) Please show your attempts you have tried and the problems/error messages you get from your attempts. – Progman Nov 24 '20 at 19:40
-
Oops. I should have retracted my close-vote before answering.... Have my re-open vote. – Yunnosch Nov 24 '20 at 20:32
-
I have edited your question to make it more clear. I admit that I changed it according to my understanding of the answer, not only of the question. Please double check that I did not put anything into your mouth/post which you cannot agree with. (And please accept my apology if I did.) – Yunnosch Nov 24 '20 at 20:40
1 Answers
The question of how to store those values is easily answered.
You store it as one single 16bit value, i.e. binary 0000010110110111, i.e. 0x05B7, i.e. decimal 1463.
Much more interesting is the question of how to read those six values back from the binary file. I.e. how do you turn a read 0x05b7 back into six values. The most important and least simple part of which is to know where the each value ends. I.e. why is the first value 000, but not 00 and not 0000?
The answer is based on the huffman tree which is obviously related to the encoding.
This is it:
?
/ \
/ \
0 1
/ \ / \
0 1 0 1
0 1 | | 0 1
a b c d e f
I.e. there is a symbol a
== 000. But there is no symbol represented by 0, or 00, or 0000.
So the first symbol must be the a
.
The next is then either 00 (nope), or 0010 (nope), or 001==b
.
And so on.
The huffman tree (if created meaningfully), is based on the analysis, that the symbols c
and d
are so much more often occurring in the input data, that giving them a two-bit representation, while the other four symbols get longer three-bit representations, is a benefit for total length. I guess that was the topic of the preceeding exercises in your class.

- 26,130
- 9
- 42
- 54
-
Hi, Thank you so much for your answer. I just have computed the code table based on how the Huffman tree was constructed. What I have right now are two lists, one is the list of the elements and the other is a list of Huffman code. However, the code was stored in a list of Strings. How to I write these strings into a binary format? I need to store all codes into a bin file. – Ray Nov 25 '20 at 01:06
-
That list which you want to write into a bin file is not the one in the red box? In that case please show the list to write. If it IS the red box please explain what you need explained beyond what I wrote in my answer. Is it the details of how to write 16bit values to a file in binary mode? Then all the mentioning of huffman encoding confused me. Or is this about storing the huffman encoding table in the same file as the encoded data? Please provide a [mre] of the code which has the data structures in which everything is stored, assuming that it exists, because you only ask about storing. – Yunnosch Nov 25 '20 at 06:29