understanding how to construct a higher order markov chain

Question

Suppose I want to predict if a person is of class1=healthy or of class2= fever. I have a data set with the following domain: {normal,cold,dizzy}

The transition matrix would contain the probability of transition generated from our training dataset while the initial vector would contain the probability that a person starts(day1) with a state x from the domain {normal,cold,dizzy}, again this is also generated from our training set.

If I want to build a first order markov chain, I would generate a 3x3 transition matrix and a 1x3 initial vector per class like so:

> TransitionMatrix
       normal cold dizzy
normal     NA   NA    NA
cold       NA   NA    NA
dizzy      NA   NA    NA

>Initial Vector
     normal cold dizzy
[1,]     NA   NA    NA

The NA will be filled with the corresponding probabilities.

1-My question is about transition matrices in higher order chain. For example in second order MC would we have a transition matrix of size domain²xdomain² like so:

               normal->normal normal->cold normal->dizzy cold->normal cold->cold cold->dizzy dizzy->normal dizzy->cold dizzy->dizzy
normal->normal             NA           NA            NA           NA         NA          NA            NA          NA           NA
normal->cold               NA           NA            NA           NA         NA          NA            NA          NA           NA
normal->dizzy              NA           NA            NA           NA         NA          NA            NA          NA           NA
cold->normal               NA           NA            NA           NA         NA          NA            NA          NA           NA
cold->cold                 NA           NA            NA           NA         NA          NA            NA          NA           NA
cold->dizzy                NA           NA            NA           NA         NA          NA            NA          NA           NA
dizzy->normal              NA           NA            NA           NA         NA          NA            NA          NA           NA
dizzy->cold                NA           NA            NA           NA         NA          NA            NA          NA           NA
dizzy->dizzy               NA           NA            NA           NA         NA          NA            NA          NA           NA

here the cell (1,1) represents the following sequence: normal->normal->normal->normal

or would it instead be just domain²xdomain like so:

               normal cold dizzy
normal->normal     NA   NA    NA
normal->cold       NA   NA    NA
normal->dizzy      NA   NA    NA
cold->normal       NA   NA    NA
cold->cold         NA   NA    NA
cold->dizzy        NA   NA    NA
dizzy->normal      NA   NA    NA
dizzy->cold        NA   NA    NA
dizzy->dizzy       NA   NA    NA

here the cell (1,1) represents normal->normal->normal which is different from the previous representation

2-What about the initial vector for a MC of degree 2. Would we need two initial vectors of size 1xdomain like so:

     normal cold dizzy
[1,]     NA   NA    NA

leading to two initial vectors per class. the first giving the probability of occurrence of {normal,cold,dizzy} on the first day for the healthy/fever class while the second gives the probability of occurrence on the second day for the healthy/fever. this would give 4 initial vectors.

OR would we just need one initial vector of size 1xdomain²like so:

    normal->normal normal->cold normal->dizzy cold->normal cold->cold cold->dizzy dizzy->normal dizzy->cold dizzy->dizzy
[1,]             NA           NA            NA           NA         NA          NA            NA          NA           NA

I can see how the second way of representing the initial vector would be problematic in case we want to classify an observation with only one state.

score 3 · Accepted Answer · answered Aug 15 '16 at 11:08

3

Say the set of spaces is S. Typically, in the nth order,

The transition matrix has dimensions |S|ⁿ X |S|. This is because given the current n history of states, we need the probability of the single next state. It is true that this single next state induces another compound state of history n, but the transition itself is to the single next state. See this example in Wikipedia, e.g..
The initial distribution is a distribution over |S|ⁿ elements (your second option).

answered Aug 15 '16 at 11:08

Ami Tavory

74,578
11
141
185

For the second one suppose I have an observation with only one state say `normal`. How would I retrieve that from the the initial distribution ? Should I sum over all of these `normal->normal normal->cold normal->dizzy` (Not sure if I made this question clear enough) – Imlerith Aug 15 '16 at 11:21
@Imlerith If IIUC your question, it is how to calculate the marginal distribution from the joint distribution. This is [well known](http://stats.stackexchange.com/questions/54472/given-a-table-defining-the-joint-probabilities-how-do-i-calculate-certain-param). – Ami Tavory Aug 15 '16 at 11:24
I am currently learning higher order markov chains, do you have any good litterature to reccomend? I have trouble finding it. I mostly find powerpoints online but without any references. Im intrested on how the transition matrix will be for a absorbing higher order markov chain. Meaning that one or more states are impossible to leave once the chain is in it. Thanks :) – Developer Feb 15 '17 at 12:58

understanding how to construct a higher order markov chain

1 Answers1