0

The following source code is from a book. Comments are written by me to understand the code better.

#==================================================================
# markov(init,mat,n,states) = Simulates n steps of a Markov chain 
#------------------------------------------------------------------
# init = initial distribution 
# mat = transition matrix 
# labels = a character vector of states used as label of data-frame; 
#           default is 1, .... k
#-------------------------------------------------------------------
markov <- function(init,mat,n,labels) 
{ 
    if (missing(labels)) # check if 'labels' argument is missing
    {
        labels <- 1:length(init) # obtain the length of init-vecor, and number them accordingly.
    }

    simlist <- numeric(n+1) # create an empty vector of 0's
    states <- 1:length(init)# ???? use the length of initial distribution to generate states.
    simlist[1] <- sample(states,1,prob=init) # sample function returns a random permutation of a vector.
                        # select one value from the 'states' based on 'init' probabilities.

    for (i in 2:(n+1))
    { 
        simlist[i] <- sample(states, 1, prob = mat[simlist[i-1],]) # simlist is a vector.
                                                    # so, it is selecting all the columns 
                                                    # of a specific row from 'mat'
    }

    labels[simlist]
}
#==================================================================

I have a few confusions regarding this source code.

Why is states <- 1:length(init) used to generate states? What if states are like S ={-1, 0, 1, 2,...}?

user366312
  • 16,949
  • 65
  • 235
  • 452

1 Answers1

1

Names of the states don't really need to have any statistical meaning as long as they are different. So, while simulating transitions between states, it's perfectly fine to choose states <- 1:length(init) or any other names for them. Ultimately, though, for practical purposes we often have in mind some labels in mind, such as -1, 0, ..., n, as in your example. You can provide those names as the labels parameter and then labels[simlist] will rename 1:length(init) to labels, element by element. I.e., if initially we had c(1, 2, 3) and you provided labels as c(5, 10, 12), then the output will be in terms of the latter vector. For instance,

(states <- sample(1:3, 10, replace = TRUE))
# [1] 1 3 3 2 2 1 2 1 3 3
labels <- c(5, 10, 12)
labels[states]
# [1]  5 12 12 10 10  5 10  5 12 12
Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
  • *Names of the states don't really have any statistical meaning as long as they are different.* - What if we need to calculate expectation? – user366312 May 09 '19 at 14:50
  • @user366312, yes, you are right, names *can* be meaningful, though I'd add that one should think before doing something like calculating expectations. For example, if there are just two states, 0 and 1, is something like 0.65 as the expected value a meaningful figure? Regardless of this good point, the rest of my post answers the technical part of your question. In particular, even if those names matter for some *subsequent* analysis, they do not matter for the simulation part, which makes it ok to use `1:length(init)`. – Julius Vainora May 09 '19 at 14:54
  • Why is `labels[simlist]` even needed here? – user366312 May 09 '19 at 23:56
  • @user366312, only for what I mentioned in my answer. You could return your simulation result as just `simlist`, in which case states would be names as `1:length(init)`. Now as to use the names provided by `labels`, we need instead `labels[simlist]`, which works as in the example in my answer. I.e., state 1 will now be renamed to `labels[1]`, state 2 to `labels[2]`, etc. – Julius Vainora May 10 '19 at 14:00