Huffman Coding for Markov Chain based on conditional distribution

Question

Before I start describing my problem, I would like to note that this question is for a project for one of my courses at University, so I do not seek for the solution, rather for a hint or an explanation.

So, lets assume that there are 3 states {1,2,3} and I also have the Transition probability Matrix (3x3). I wrote a matlab script that based on the transition matrix, it creates a vector with N samples for the Markov Chain. Assume that the first state is the state 1. Now, I need to Huffman code this chain based on the conditional distribution p_{X_n |X_n−1} .

If I am not mistaken, I think that I have to create 3 Huffman dictionaries and encode each symbol from the chain above, based on the previous state(?), which means that each symbol is going to be encoded with one out of the three dictionaries I created, but not all of them with the same dictionary.

If the encoding process is correct, how do I decode the coded vector?

I am not really sure if that's how it should be done.

Any ideas would be appreciated. Thanks in advance!

score 1 · Answer 1 · answered Apr 21 '18 at 15:49

1

That's right. There would be a Huffman code for the three symbols p11, p12, and p13, another for p21, p22, p23, etc.

Decoding chooses which code to use based on the current state. There needs to either be an assumption for the starting state, or the starting state needs to be transmitted.

However this case is a little odd, since there is only one Huffman code for three symbols, consisting of 1 bit, 2 bits, and 2 bits. E.g. 0, 10, 11. So the only gain you get is by picking the highest probability for the one-bit symbol.

answered Apr 21 '18 at 15:49

Mark Adler

101,978
13
118
158

thanks for your answer. I still have a question though. "Decoding chooses which code to use based on the current state". Lets say that I start decoding based on the dictionary of the current state's symbol. Do I get the next symbol and repeat the process for N steps? – Sotiris Dimitras Apr 22 '18 at 19:50
You are in state 1, 2, or 3. You use the corresponding Huffman code out of the three you have to choose from, to decode the next few bits, and you will end up in state 1, 2, or 3. Repeat. – Mark Adler Apr 22 '18 at 19:59
Great, thanks again! I am going to test it and see how it's going. I will post my results after I finish it! – Sotiris Dimitras Apr 22 '18 at 20:03

score 0 · Accepted Answer · answered May 01 '18 at 17:33

Well, having solved the problem above, I decided to post the answer with the octave script in case anyone needs it in future.

So, lets assume that there are 5 states {1,2,3,4,5} and I also have the Transition probability Matrix (5x5). I Huffman encoded and decoded the Markov chain for 1000 Monte Carlo experiments.

The Octave Script is:

%starting State of the chain
starting_value = 1;
%Chain Length
chain_length = 100;

%# of Monte Carlo experiments
MC=1000;

%Variable to count all correct coding/encoding experiments
count=0;

%Create unique symbols, and assign probabilities of occurrence to them.
symbols = 1:5; 
p1 = [.5 .125 .125 .125 0.125];
p2 = [.25 .125 .0625 .0625 0.5];
p3 = [.25 .125 .125 .25 0.25];
p4 = [.125 0 .5 .25 0.125];
p5 = [0 .5 .25 .25 0];

%Create a Huffman dictionary based on the symbols and their probabilities.
dict1 = huffmandict(symbols,p1);
dict2 = huffmandict(symbols,p2);
dict3 = huffmandict(symbols,p3);
dict4 = huffmandict(symbols,p4);
dict5 = huffmandict(symbols,p5);

% Create the transition matrix for each state
T= [0.5 0.125 0.125 0.125 0.125;
    0.25 0.125 0.0625 0.0625 0.5;
    0.25 0.125 0.125 0.25 0.25;
    0.125 0 0.5 0.25 0.125 ;
    0 0.5 0.25 0.25 0];

%Initialize Marcov Chain
chain = zeros(1,chain_length);
chain(1)=starting_value;

for i=1 :MC
    comp=[];
    dsig=[];
    %Create Markov Chain
    for i=2:chain_length
        this_step_distribution = T(chain(i-1),:);
        cumulative_distribution = cumsum(this_step_distribution);

        r = rand();

        chain(i) = find(cumulative_distribution>r,1);
    end

    comp=huffmanenco(chain(1),dict1);
    %Encode the random symbols.
    for i=2:chain_length
        if chain(i-1)==1
            comp = horzcat(comp,huffmanenco(chain(i),dict1));
        elseif chain(i-1)==2
            comp = horzcat(comp,huffmanenco(chain(i),dict2));
        elseif chain(i-1)==3
            comp = horzcat(comp,huffmanenco(chain(i),dict3));
        elseif chain(i-1)==4
            comp = horzcat(comp,huffmanenco(chain(i),dict4));
        elseif chain(i-1)==5
            comp = horzcat(comp,huffmanenco(chain(i),dict5));
        end
    end

    %Decode the data. Verify that the decoded data matches the original data.
    dsig(1)=starting_value;
    comp=comp(length(dict1{1,1})+1:end);
    for i=2:chain_length
        if dsig(end)==1
            temp=huffmandeco(comp,dict1);
            comp=comp(length(dict1(temp(1)){1,1})+1:end);
        elseif dsig(end)==2
            temp=huffmandeco(comp,dict2);
            comp=comp(length(dict2(temp(1)){1,1})+1:end);
        elseif dsig(end)==3
            temp=huffmandeco(comp,dict3);
            comp=comp(length(dict3(temp(1)){1,1})+1:end);
        elseif dsig(end)==4
            temp=huffmandeco(comp,dict4);
            comp=comp(length(dict4(temp(1)){1,1})+1:end);
        elseif dsig(end)==5
            temp=huffmandeco(comp,dict5);
            comp=comp(length(dict5(temp(1)){1,1})+1:end);
       end
       dsig=horzcat(dsig,temp(1));
    end
    count=count+isequal(chain,dsig);
end

count

The "variable" count is to make sure that in all of the MC experiments, the Markov Chain that was produced was properly encoded and decoded. (Obviously, if count equals to 1000, then all the experiments had correct results)

Huffman Coding for Markov Chain based on conditional distribution

2 Answers2