I have been working various implementations of markov chains for a while, and I just want to clarify a generalisation of the chains.
Generation
If i want to generate a sequence of length n, we simply sample from the initial probabilities, then take this state just generated to find the row of the transition matrix, and do this n-1 times? So if the state is "A" from the sample of the initial, we just use the "A" row of the transition matrix as the seed for the next sample?
{I have an implementation in R of markov chains in which each iteration, the initial and the transition matrices are multiplied, the initial by the transition, and the transition by itself. Where or when does one apply this matrix multiplication for chain generation? I have been told that these are used for determining the values of states after some number of repetitions.... but these repetitions are what? I just want to generate states for a particular length of sequence, where does this repetition come in if I am sampling from the transition matrix, based on nucleotide frequencies in the original/input sequence?} - sorted by biziclop below
Probability of User entry
I have seen several implementations here.
Input - "ACGT"
P(ACGT) = P(A) * P(C|A) * P(G|C) * P(T|G)
Does this imply that P(A) is from the initial/start probabilities and that conditional probabilities (P(C|A) etc) are from the transition matrix?
Or does this imply a maximum estimation here, where P(A) = #A's/#nucleotides? And therefore P(C|A) = #C's / #A's?
If entries in the transition are zero, then do we use laplacian estimates or other forms of pseudocounts to combat this?
If so, where does one apply the pseudocounts? Does each entry of the transition matrix get an extra count? If we use the transition matrix to generate the probabilities, then the pseudocounts would have to be added here....no?
A discussion would be helpful. No code or any mathematics need to be given.