I am really struggling with the order of merging trees that have the same "weight" in Huffman Coding. I looked into a lot of sources but all of them seem to cover just "simple cases" where there are no more than two elements with the same weight or the just don't cover the whole topic at all.
Lets say I have the following String I want to encode: ABCDEE
. (Style based on this website)
So I have:
FREQUENCY VALUE
--------- -----
1 A
1 B
1 C
1 D
2 E
I start building the tree now with two of the smallest elements:
Question 1) Do I have to use A & B
or how am I able to decide which values I should use? I know they have to be the smallest ones, but other than that? E.g. A & D
?
This is important as at the end (lets say I do the following:)
2:[A&B] 2:[B&C]
/ \ / \
1:A 1:B 1:B 1:C
and with that the following table:
FREQUENCY VALUE
--------- -----
2 [A&B]
2 [C&D]
2 E
Question 2) Again... in which order should I merge the trees? E.g. [A&B]&E
or [A&B]&[C&D]
Because, if I merge [A&B]&E
first, the tree will look like this:
4:[A&B&E]
/ \
2:[A&B] 2:E
/ \
1:A 1:B
(Question 3) And how to decide if 2:E
should be on the left or on the right?)
And after joining [C&D]
the final tree looks like this:
6:[A&B&C&D&E]
/ \
2:[C&D] 4:[A&B&E]
/ \ / \
1:C 1:D 2:[A&B] 2:E
/ \
1:A 1:B
BUT if I start with joining [A&B]&[C&D]
:
4:[A&B&C&D]
/ \
2:[A&B] 2:[C&D]
/ \ / \
1:A 1:B 1:C 1:D
And then join E
, the final tree looks like this:
6:[A&B&C&D&E]
/ \
E:2 4:[A&B&C&D]
/ \
2:[A&B] 2:[C&D]
/ \ / \
1:A 1:B 1:C 1:D
So in the first variant E
would be 11
and in the second variant 0
. Or as another example C
would be 00
vs being 110
...
I think there must be an elementary rule I'm missing here, because Huffman Coding has to be deterministic (to decode it properly), doesn't it!?