-1

I'm trying to code a Huffman as a bit of homework, and I'm a bit confused about how I should start creating the Huffman tree. I am aware that the Huffman tree takes the two lowest frequencies and makes them into a tree with the sum of their frequency as a parent.

In my main method, I have the Symbol with their probability:




import java.util.PriorityQueue;

public final class TCode {

    private CodeItem[] item = null;

    public final static class CodeItem {

        private String symbol;
        private double probability; 
        private String encoding; 

        public CodeItem(String symbol, double probability, String encoding) {
            this.symbol = symbol.trim();
            this.probability = probability;
            this.encoding = encoding;
            if (!is01() || this.symbol == null || this.symbol.length() == 0 || this.probability < 0.0)
                throw new IllegalArgumentException();
        }

        public CodeItem(String symbol, double probability) {
            this(symbol, probability, null);
        }

        public String getSymbol() {
            return symbol;
        }

        public double getProbability() {
            return probability;
        }

        public String getEncoding() {
            return encoding;
        }

        public void setEncoding(String encoding) {
            this.encoding = encoding;
        }

        public boolean is01() {

            if (encoding == null || encoding.length() == 0)
                return true;

            for (int i = 0; i < encoding.length(); ++i)
                if ("01".indexOf(encoding.charAt(i)) < 0)
                    return false;

            return true;
        }

    }

    public TCode(CodeItem[] codeItem) {

        if (codeItem == null || codeItem.length == 0)
            throw new IllegalArgumentException();

        double sum = 0.0;
        for (int i = 0; i < codeItem.length; ++i) {
            sum += codeItem[i].probability;
            if (codeItem[i].probability == 0.0)
                throw new IllegalArgumentException();
        }
        if (Math.abs(sum - 1.0) > 1e-10)
            throw new IllegalArgumentException();

        item = new CodeItem[codeItem.length];
        for (int i = 0; i < codeItem.length; ++i)
            item[i] = codeItem[i];

    }

    public boolean is01() {

        for (int i = 0; i < item.length; ++i)
            if (!item[i].is01())
                return false;

        return true;
    }

    public double entropy() {

        double result = 0.0;

        for (int i = 0; i < item.length; ++i)
            result += item[i].probability * (-Math.log(item[i].probability) / Math.log(2.0));

        return result;
    }

    public double averageWordLength() {

        double result = 0.0;

        for (int i = 0; i < item.length; ++i)
            result += item[i].encoding.length() * item[i].probability;

        return result;
    }

    public boolean isPrefixCode() {

        for (int i = 1; i < item.length; ++i)
            for (int j = 0; j < i; ++j)
                if (item[i].encoding.startsWith(item[j].encoding) || item[j].encoding.startsWith(item[i].encoding))
                    return false;
        return true;
    }

    public int size() {
        return item.length;
    }

    public CodeItem getAt(int index) {
        return item[index];
    }

    public CodeItem getBySymbol(String symbol) {

        for (int i = 0; i < item.length; ++i) {
            if (item[i].symbol.equals(symbol))
                return item[i];
        }
        return null;
    }

    
    

    }

}

Oliver
  • 29
  • 3

1 Answers1

0

If I understand it correctly, you start forming binary trees with the two lowest frequencies and add their values to create the parent. Then, work your way up. In your case, the two lowest are:

("A", 0.12) and ("D", 0.13) add to 0.25

At this point you will have:

     (0.25)
     /    \ 
    /      \
   /        \
D(0.13)   A(0.12)

Then, since you have two more nodes with values frequencies than 0.25, you'll create another binary tree with those values (B & E)

     (0.35)
     /    \ 
    /      \
   /        \
B(0.19)   E(0.16)

and the resulting parent of this tree will join the resultant parent of the previous tree

     (0.60)
     /    \ 
    /      \
   /        \
B&E(0.35)   A&D(0.25)

Lastly, C is joined

      (1.00)
      /    \ 
     /      \
    /        \
ADBE(0.60)  C(0.40)

Your tree should look like this:

                  (1.00)
                  /    \ 
                 /      \
                /        \
            ADBE(0.60)  C(0.40)
            /        \ 
           /          \
          /            \
   BE(0.35)          AD(0.25)
     /    \            /    \ 
    /      \          /      \
   /        \        /        \
B(0.19)  E(0.16)  D(0.13)   A(0.12)

In summary, you form a binary tree with the two lowest frequencies. The frequency of the parent is the sum of its children. That resulting node can join only one element to form continue forming a binary tree. Therefore, its sibling element must be either the next element with lowest frequency OR a resulting binary tree from elements whose sum is lower or equal to it. For example, when the A&D node was formed, it was joined with the resulting tree of B&E. If one of the frequencies for B or E would've been higher than 0.25, only the element with the lower frequency would've been joined with A&D.

hfontanez
  • 5,774
  • 2
  • 25
  • 37
  • Yes exactly, and then I need to implement it into my Huffman class somehow. I am gonna have a look around on methods how to do that. – Oliver Apr 04 '21 at 07:55
  • 1
    @Oliver, one way is to create a function, for instance `createParentNode` that takes two elements (i.e. `child1`, `child2`) and returns a new element that you add to a new set. Of course, the two children elements must be removed from your original collection before calling the method. To make this simpler, change `CodeItem[] item` to a collection (probably a `TreeSet` so they could be sorted easier). So, since the `parent` collection is originally empty, you'll grab two elements to start. Then, you'll compare `parent` set's first element to `items`. Follow the logic I described. – hfontanez Apr 04 '21 at 12:51
  • 1
    @Oliver remember, you are done when you have iterated through your original list, or if you follow my advice, when the original list is empty. – hfontanez Apr 04 '21 at 13:00