Suppose I want to predict if a person is of class1=healthy
or of class2= fever
. I have a data set with the following domain: {normal,cold,dizzy}
The transition matrix would contain the probability of transition generated from our training dataset while the initial vector would contain the probability that a person starts(day1) with a state x from the domain {normal,cold,dizzy}
, again this is also generated from our training set.
If I want to build a first order markov chain, I would generate a 3x3 transition matrix and a 1x3 initial vector per class like so:
> TransitionMatrix
normal cold dizzy
normal NA NA NA
cold NA NA NA
dizzy NA NA NA
>Initial Vector
normal cold dizzy
[1,] NA NA NA
The NA will be filled with the corresponding probabilities.
1-My question is about transition matrices in higher order chain. For example in second order MC would we have a transition matrix of size domain²xdomain²
like so:
normal->normal normal->cold normal->dizzy cold->normal cold->cold cold->dizzy dizzy->normal dizzy->cold dizzy->dizzy
normal->normal NA NA NA NA NA NA NA NA NA
normal->cold NA NA NA NA NA NA NA NA NA
normal->dizzy NA NA NA NA NA NA NA NA NA
cold->normal NA NA NA NA NA NA NA NA NA
cold->cold NA NA NA NA NA NA NA NA NA
cold->dizzy NA NA NA NA NA NA NA NA NA
dizzy->normal NA NA NA NA NA NA NA NA NA
dizzy->cold NA NA NA NA NA NA NA NA NA
dizzy->dizzy NA NA NA NA NA NA NA NA NA
here the cell (1,1)
represents the following sequence: normal->normal->normal->normal
or would it instead be just domain²xdomain
like so:
normal cold dizzy
normal->normal NA NA NA
normal->cold NA NA NA
normal->dizzy NA NA NA
cold->normal NA NA NA
cold->cold NA NA NA
cold->dizzy NA NA NA
dizzy->normal NA NA NA
dizzy->cold NA NA NA
dizzy->dizzy NA NA NA
here the cell (1,1)
represents normal->normal->normal
which is different from the previous representation
2-What about the initial vector for a MC of degree 2. Would we need two initial vectors of size 1xdomain
like so:
normal cold dizzy
[1,] NA NA NA
leading to two initial vectors per class. the first giving the probability of occurrence of {normal,cold,dizzy}
on the first day for the healthy/fever
class while the second gives the probability of occurrence on the second day for the healthy/fever
. this would give 4 initial vectors.
OR would we just need one initial vector of size 1xdomain²
like so:
normal->normal normal->cold normal->dizzy cold->normal cold->cold cold->dizzy dizzy->normal dizzy->cold dizzy->dizzy
[1,] NA NA NA NA NA NA NA NA NA
I can see how the second way of representing the initial vector would be problematic in case we want to classify an observation with only one state.