-1

I am trying to write a program that sets a state from A to state B and vice versa.

rnumbers <- data.frame(replicate(5,runif(2000, 0, 1)))

I am imagining this data frame of random numbers in a uniform distribution, except it has 10000 rows instead of 20 rows.

Setting the probability of going to state A and state B :

dt <- c(.02)
A <- dt*1
B <- dt*.5

Making a function that goes through data frame rnumbers and putting in a 0 if the number is less than B and a 1 if the number is less than A.

step_generator <- function(x){
    step <- ifelse ( x <  B, 0, ifelse(x < A, 1, NA))
    return(step)
    }
state <- apply(rnumbers, 2, step_generator)

This essentially gives me what I want - a data frame with columns that contain 0, 1, or NA depending on the value of the random number in rnumbers. However, I am missing a couple of things--

1) I would like to keep track of how long each state lasts. What I mean by that, is if you imagine each row as a change in time as above (dt <- c(.02)). I want to be able to plot "state vs. time". In order to address this, this is what I tried :

state1 <- transform(state, time = rep(dt))
state2 <- transform(state1, cumtime = cumsum(time))

This gets me close to what I want, cumtime goes from .02 to .4. However, I want the clock to start at 0 in the 1st row and add .02 to every subsequent row.

2) I need to know how long each state lasts for. Essentially, I want to be able to go through each column, and ask for how much time (cumsum) does each state last. This would then give me a distribution of times for state A and state B. I want this stored in another data frame.

I think this makes sense, if anything is unclear please let me know and I will clarify.

Thanks for any and all help!

Frank
  • 66,179
  • 8
  • 96
  • 180
user2813055
  • 283
  • 4
  • 13
  • `rle` can be used to find the length of a spell. – Frank Nov 11 '13 at 01:16
  • using 'rle' on this data frame accounts for NA values in the columns, which doesn't get me where i want to be. is there a work around for this problem @Frank ? – user2813055 Nov 11 '13 at 02:12
  • 1
    Does NA mean "it stays in the previous state"? If so, you may want to take a more standard approach to Markov chains. Here's a relevant question: http://stackoverflow.com/q/2754469/1191259 – Frank Nov 11 '13 at 02:14
  • I get all NA's when I run the code. Can you review the settings for A and B and clarify what is desired. In particular it's unclear whether you really want to be creating an NA-state. – IRTFM Nov 11 '13 at 02:29
  • @DWin I just edited the code so you should be receiving some 1 and 0's. You only received NA because the probability of states is low and they do not appear when there is only 20 rows, so I have made rnumbers have 2000 rows. Alternatively, I could have made dt larger to increase probability of states A and B. No i do not want there to be NA's, I just couldn't figure out how to not include them. – user2813055 Nov 11 '13 at 03:05
  • @Frank Yes NA means it stays in the previous state, i.e. the random number in that row does not meet criteria to be labelled either a 0 or 1. – user2813055 Nov 11 '13 at 03:06
  • Well _that_ was annoying. All that changed is that the number of NA's now occupy 100 screens of space rather than only 20 lines. I see no reason to increase the matrix dimensions. Will please clarify A) the desired starting states and B) Describe the transition rules better, and C) Get rid of the NA's. – IRTFM Nov 11 '13 at 03:12
  • @DWin sorry for the annoyance. I would like there to be no NA's, I just don't know how. The transition rules are as follows - start state is 0 then sample random uniform distribution between 0 and 1, if the number is less than .02*.5 then stay in state 0, if the number is less than .02*1 and greater than .02*.5 change to state 1. If neither are true stay in previous state. Repeat thousands of times. – user2813055 Nov 11 '13 at 14:11

1 Answers1

1

The range between "number is less than .02*1 and greater than .02*.5" is very narrow so if you are setting this simulation up, most of the first row will most probably be zero. You cannot really hope to get success with ifelse when the conditions have any look-back features. That function doesn't allow "back-indexing".

rstate <- rnumbers  # copy the structure
rstate[] <- NA      # preserve structure with NA's
# Init:
rstate[1, ] <- rnumbers[1, ] <  .02 & rnumbers[1, ] > 0.01

step_generator <- function(col, rnum){
    for (i in 2:length(col) ){
            if( rnum[i] < B) { col[i] <- 0  }
                       else { if (rnum[i] < A) {col[i] <- 1 }
                              else {col[i] <- col[i-1] } }
                        }
    return(col)
    }
#  Run for each column index:
for(cl in 1:5){ rstate[ , cl] <- 
                        step_generator(rstate[,cl], rnumbers[,cl]) }
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • this works thanks. How would I go about figuring out how many rows each state lasted? For example, in one column how many rows did the initial stat last? the second state? the third? fourth? etc. Essentially giving me a distribution of length of each state? – user2813055 Nov 11 '13 at 22:45
  • You can use either `rle` to get a lossless version of 0/1 durations or just `sum(col)` to get a count, since the values are 0/1. – IRTFM Nov 11 '13 at 22:51