-2

Given a vector V in Matlab I want to calculate the code length without generating the code...

v = [0.1,0.1,0.1,0.2,0.2,0.3]; the code length = 17. How can I calculate it without generating the code.

Thanks

Maroun Sassine
  • 319
  • 5
  • 14
  • What is the "code length"? Do you mean the length of the symbol that produces the maximum length? – rayryeng Jun 12 '15 at 17:44
  • The vector's generated code has a length as shown above. v's generated code has length of 17 bits. Can I know the length of v's generated code without generating the huffman code ? – Maroun Sassine Jun 12 '15 at 17:50
  • it is the sum off all symbols – Maroun Sassine Jun 12 '15 at 17:51
  • I don't think it's possible without actually constructing the probability mass function of all of the symbols and then recursively determining the codes per symbol. It's a dynamic algorithm where the symbols are computed based on the PMF. I can't see a closed-form solution to what you're asking by just considering your data. Instead I'm going to ask you another question: **Why** is it necessary for you to do this? Why don't you want to construct the Huffman table? – rayryeng Jun 12 '15 at 17:59

2 Answers2

1

So there are six symbols? Then the maximum code length can't be 17. The maximum code length with six symbols for any set of frequencies is five bits. (0, 10, 110, 1110, 11110, 11111).

For that particular set of probabilities, assuming one symbol per probability and that the probabilities are exact, you can get two different codes depending on the choices made when executing the Huffman algorithm. One has a maximum length of 3, the other a maximum length of 4. Both codes are equally optimal in coding the symbols. The two codes have code lengths, in the same frequency order, (4,4,3,2,2,2) and (3,3,3,3,2,2).

You may mean the sum of the bits over the six possible symbols, which is in fact 17 for one of the codes, but 16 for the other. However that is a meaningless measure, since you have used each symbol once, in contradiction to their stated probabilities. A useful measure would be multiplying each symbol length in bits by the probability to get an average symbol length in bits. That is 2.5 bits for both of those codes. That is how you verify that both codes are equally optimal.

In general you need to apply the Huffman algorithm in order to determine the maximum code length. There is no other shortcut. You can traverse the tree to find the maximum length. You do not need to explicitly generate the code per se, but the code is implied by the tree.

You can compute the entropy to get a lower bound on the average symbol length in bits. That is the sum of each probability times its negative base-2 logarithm. In this case, the entropy is 2.446.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • 100% agree. +1. I tried asking the OP why they even want to entertain this question... No replies yet. – rayryeng Jun 13 '15 at 00:22
0

Maybe I wasn't clear with my question, but I think this code will return the minimal code length for a vector of data 'v'.

% return the huffman lenght of a matrix
function S = hufman_length(v)


    v = (v(:));
    v = hist(v,256);
    v = v(find(v>0));  
    S = 0;
    %acumulating the probability
    while (length(v) >= 2)
    v = sort(v); 
    S = S + v(1) + v(2);
    v(2) = v(1) + v(2);
    v = v(2:length(v));

    end


end
Maroun Sassine
  • 319
  • 5
  • 14