1

I have a text in a column and i would like to build a markov chain. I was wondering of there is a way to build markov chain for states A, B,C, D and generate a markov chain with that states. Any thoughts?

A<- c('A-B-C-D', 'A-B-C-A', 'A-B-A-B')
user3570187
  • 1,743
  • 3
  • 17
  • 34
  • can you be a little more specific? How would you like to specify the matrix of transition probabilities? – Ben Bolker Dec 31 '16 at 14:38
  • This question looks related / like you may find its answers useful http://stackoverflow.com/questions/2754469/r-library-for-discrete-markov-chain-simulation?rq=1 – hodgenovice Dec 31 '16 at 14:40
  • What have you tried? There is lots of information about R and Markov chains. There is also [this package](https://cran.r-project.org/web/packages/markovchain/index.html) – R. Schifini Dec 31 '16 at 14:41
  • I can do markov when the states are listed column wise with state.table with msm package, but this is little trickier as the patterns are in a row. I will look into the pdf.Thanks! – user3570187 Dec 31 '16 at 14:47
  • So, I want the transitions to be A-B (1), B-C(.75), B-A (.25), C-A(.50), C-D(.50), A-B(1) – user3570187 Dec 31 '16 at 14:50
  • it is a third order markov chain with 4 states. Each row represents one set of possible ways how transitions happen like (Snow-Rain-Snow-Sunny) – user3570187 Dec 31 '16 at 14:52

2 Answers2

2

If you want to compute the transition probability matrix (row stochastic) with MLE from the data, try this:

A <- c('A-B-C-D', 'A-B-C-A', 'A-B-A-B', 'D-B-C-A') # the data: by modifying your example data little bit
df <- as.data.frame(do.call(rbind, lapply(strsplit(A, split='-'), function(x) t(sapply(1:(length(x)-1), function(i) c(x[i], x[i+1]))))))
tr.mat <- table(df[,1], df[,2])
tr.mat <- tr.mat / rowSums(tr.mat) # make the matrix row-stochastic
tr.mat

  #           A         B         C         D
  # A 0.0000000 1.0000000 0.0000000 0.0000000 # P(A|A), P(B|A), P(C|A), P(D|A) with MLE from data
  # B 0.2500000 0.0000000 0.7500000 0.0000000
  # C 0.6666667 0.0000000 0.0000000 0.3333333
  # D 0.0000000 1.0000000 0.0000000 0.0000000
Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
2

Since you mentioned that you know how to work with statetable.msm, here's a way to translate the data into a form it can handle:

dd <- c('A-B-C-D', 'A-B-C-A', 'A-B-A-B')

Split on dashes and arrange in columns:

d2 <- data.frame(do.call(cbind,strsplit(dd,"-")))

Arrange in a data frame, identified by sequence:

d3 <- tidyr::gather(d2)

Construct the transition matrix:

statetable.msm(value,key,data=d3)
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453