1

I want to estimate the Transition Probability Matrix for a first order Markov chain from a given set of data sequences (i.e. clickstream data). Possibly in java, otherwise Matlab is ok.

I have each sequence in a different file (but of course I can merge everything in a single one) and one of the issues is that I don't have a standard length for the sequences. I Know the state space and I'm only interested in the state transitions.

I've read this: Estimate Markov Chain Transition Matrix in MATLAB With Different State Sequence Lengths but i'm not sure it fits to my problem. I was also wondering if there are Java libraries that handle this issues. If so, I wasn't able to find them.

Community
  • 1
  • 1
Any
  • 75
  • 2
  • 9

1 Answers1

2

You have to create a matrix which counts transitions.

For the row 1,4,4,6,7

You have to set

M(1,4)=M(1,4)+1
M(4,4)=M(4,4)+1
M(4,6)=M(4,6)+1
M(6,7)=M(4,7)+1

Finally normalize every row to sum 1.

Update: Using char indices. Matlab can transform every char to a number using double('A'), thus it is simple index shifting.

char2index=@(x)(double(x)-'A'+1)
index2char=@(x)(char(x+'A'-1))
M(char2index('A'),char2index('B'))=M(char2index('A'),char2index('B'))+1

The second function index2char transforms indices back to the character.

Daniel
  • 36,610
  • 3
  • 36
  • 69
  • Thank you. That was really easy. I have an other question thought. Can you think of an implementation that works for states defined as strings (i.e. A, B, C, D) instead of numbers? I've wrote down an easy matlab implementation with a sequence of numbers (also, i've found the hmmestimate function that works pretty fine). but i can't think of a way to deal with string. Thanks for your kindness – Any Nov 21 '13 at 09:21
  • I updated my answer, if you need only labels form `A` to `Z` (1 character) this is enough. For more labels I would solve it using `containers.Map` – Daniel Nov 21 '13 at 12:21