JAGS: Use cell matrix as data input for excluding NA

Question

Below is my model and example data I would use. There are some NA in the data that I needed to set priors to generate numbers but this way may cause some errors. I was wondering if I could just let JAGS skip the NA so like have a matrix with different rows and columns.

NAs are in ex_expectancy and ex_shock.

# data
# 3 subjects * 14 trials
ex_expectancy <- structure(list(`1` = c(9L, 5L, 1L), `2` = c(5L, 6L, 1L), `3` = c(2L, 7L, 4L), `4` = c(3L, 6L, 2L), `5` = c(9L, 6L, 4L), `6` = c(9L, 7L, 1L), `7` = c(3L, 5L, 5L), `8` = c(8L, 5L, 1L), `9` = c(10L, 5L, NA), `10` = c(9L, NA, NA), `11` = c(2L, NA, NA), `12` = c(3L,NA, NA), `13` = c(3L, NA, NA), `14` = c(4L, NA, NA)), row.names = c(NA,-3L), class = c("data.table", "data.frame"))
ex_shock <- structure(list(`1` = c(0L, 1L, 1L), `2` = c(0L, 1L, 0L), `3` = c(1L, 0L, 1L), `4` = c(1L, 0L, 1L), `5` = c(0L, 1L, 1L), `6` = c(0L, 0L, 0L), `7` = c(1L, 0L, 1L), `8` = c(1L, 1L, 1L), `9` = c(0L,1L, NA), `10` = c(1L, NA, NA), `11` = c(1L, NA, NA), `12` = c(0L, NA, NA), `13` = c(1L, NA, NA), `14` = c(0L, NA, NA)), row.names = c(NA,-3L), class = c("data.table", "data.frame"))

v <- matrix(NA, nrow=3,ncol=14)
v[,1] <- 0 # first v is 0

dlist <- list(
  Nsubjects = 3,
  Ntrials = 14,
  expectancy = ex_expectancy, 
  shock = ex_shock,
  v=v
)

myinits <-  list(list(
  alpha = runif (3,0,1))) # 3 subjects
parameters <- c('alpha','v','predk','scale','c','tau')

# model
RW <- function(){
  for (i in 1:Nsubjects)
  {
    for (j in 2:Ntrials) # for each trial
    {
      expectancy [i,j] ~ dnorm (scale [i] * v[i,j] + c[i],tau[i,j])
      # posteiror predictive
      predk [i,j] ~ dnorm (scale [i] * v[i,j] + c[i],tau[i,j])
      
      
      pe [i,j-1] <- shock [i,j-1] - v [i,j-1]
      v [i,j] <- v [i,j-1] + alpha [i]  * pe [i,j-1]
    }
  }
  # priors
  for (i in 1: Nsubjects){
    alpha [i] ~ dunif (0,1)
    scale [i] ~ dunif (0,10)
    c[i] ~ dunif (0,5)
    
    for (j in 1:Ntrials){
      sigma[i,j] ~ dunif (0,5)
      tau [i,j] <- 1/pow(sigma [i,j],2)
    }}
}


samples <- jags(dlist, inits=myinits, parameters,
                model.file = RW,
                n.chains=1, n.iter=1000, n.burnin=500, n.thin=1, DIC=T)

Without any information on how you plan to use this it's hard to make suggestions. Look into nested indexing though (e.g., https://github.com/mfidino/jags-ragged-array). The linked example of mine is for species occupancy data, but it looks like it shares a lot of similarities to your matrix here (1's, 0's, and NA values). — mfidino, Mar 11 '21 at 13:53
Thank you so much. I will follow this nested indexing approach to see if I can solve the NA problem! — Kenny, Mar 11 '21 at 14:05
If it does work let me know so I can put something here as an answer (instead of having a link in a comment). Best of luck! — mfidino, Mar 12 '21 at 13:37
Yes, I will! Sorry I am still trying to understand how it work and how to implement it to my data. I will update you here once I figure it out (or not)! — Kenny, Mar 13 '21 at 11:15
@mfidino, Sorry, I have been trying to implement the ragged array myself but still not sure how to do it. I have put my model and example data in the question, would you mind take a look if my data structure is appropriate to use this approach? — Kenny, Mar 14 '21 at 09:44
Yeah it looks like nested indexing SHOULD do the trick here. I'll have some time in the next day or so to apply it to the data & model here. — mfidino, Mar 14 '21 at 18:29
I think there is some confusion right now with your representation of subjects and trials. For example, the data you are supplying is ordered (I think) trial by subject (3 rows, 14 columns). However, the priors you have specified are in a completely very different. `alpha`, for example, is of length 42. `sigma` is a matrix that is 42 x 3. I'm assuming something like `sigma` should be the same size as `ex_shock`, but I'm not sure how long `alpha` should be (i.e., you need to fix this a bit more or provide a little more direction before I can do the nested indexing). — mfidino, Mar 15 '21 at 12:18
I am really sorry about the errors. The example I made is 3 subjects (i=3) and 14 trials (j=14). alpha is subject based so the length for initial value should be 3. Sigma and tau are for controlling trial error which is why they are a matrix of i*j. I will double check the data and make sure everything is correct. — Kenny, Mar 15 '21 at 13:00

score 1 · Accepted Answer · answered Mar 15 '21 at 15:19

So the workaround here is a little bit easier than my standard nested indexing because you are always missing data on the right side of your matrices (ie.,once data is NA it is NA for the rest of the column). As such, instead of needing to apply nested indexing within the loops you can just apply it to the second for loop (I'm using runjags here as that is what I am most familiar with).

# data
ex_expectancy <- structure(list(`1` = c(9L, 5L, 1L), `2` = c(5L, 6L, 1L), `3` = c(2L, 7L, 4L), `4` = c(3L, 6L, 2L), `5` = c(9L, 6L, 4L), `6` = c(9L, 7L, 1L), `7` = c(3L, 5L, 5L), `8` = c(8L, 5L, 1L), `9` = c(10L, 5L, NA), `10` = c(9L, NA, NA), `11` = c(2L, NA, NA), `12` = c(3L,NA, NA), `13` = c(3L, NA, NA), `14` = c(4L, NA, NA)), row.names = c(NA,-3L), class = c("data.table", "data.frame"))
ex_shock <- structure(list(`1` = c(0L, 1L, 1L), `2` = c(0L, 1L, 0L), `3` = c(1L, 0L, 1L), `4` = c(1L, 0L, 1L), `5` = c(0L, 1L, 1L), `6` = c(0L, 0L, 0L), `7` = c(1L, 0L, 1L), `8` = c(1L, 1L, 1L), `9` = c(0L,1L, NA), `10` = c(1L, NA, NA), `11` = c(1L, NA, NA), `12` = c(0L, NA, NA), `13` = c(1L, NA, NA), `14` = c(0L, NA, NA)), row.names = c(NA,-3L), class = c("data.table", "data.frame"))


v <- matrix(NA, nrow=3,ncol=14)
v[,1] <- 0

dlist <- list(
  NSubjects = 3,
  Ntrials = 14 - rowSums(is.na(ex_shock)),
  maxTrials = 14,
  expectancy = as.matrix(ex_expectancy), 
  shock = as.matrix(ex_shock),
  v = v
)

myinits <-  list(list(
  alpha = runif (3,0,1)))
parameters <- c('alpha','v','predk','scale','c','tau')


{sink("model.txt")
cat("
model{
    for (i in 1:NSubjects){
      for (j in 2:Ntrials[i]){
        expectancy[i,j] ~ dnorm (scale[i] * v[i,j] + c[i],tau[i,j])
        # posteiror predictive
        predk[i,j] ~ dnorm (scale [i] * v[i,j] + c[i],tau[i,j])
        pe[i,j-1] <- shock[i,j-1] - v[i,j-1]
        v[i,j] <- v[i,j-1] + alpha[i]  * pe[i,j-1]
      }
    }
    # priors
    for (i in 1: NSubjects){
      alpha[i] ~ dunif (0,1)
      scale[i] ~ dunif (0,10)
      c[i] ~ dunif (0,5)
      
      for (j in 1:maxTrials){
        sigma[i,j] ~ dunif (0,5)
        tau[i,j] <- 1/pow(sigma [i,j],2)
      }}
  }"
  ,fill = TRUE)
}
sink()



library(runjags)


samples <- run.jags("model.txt", monitor = parameters, data = dlist,
                    n.chains = 2,sample = 10000, burnin = 5000,
                    thin = 1)

Basically Ntrials becomes a vector of length NSubjects. By applying that small change the model will compile and run. This does not, however, address any potential fitting issues with the model. As I'm unsure what you are actually fitting, I do not know if the model is correct as specified. Looking at the output of the mcmc it looks as if something odd is still going on (some parts of predk and tau are NA).

library(coda)
 my_mcmc <- as.matrix(as.mcmc.list(samples))

round(my_mcmc[1,],2)
  alpha[1]    alpha[2]    alpha[3]      v[1,1]      v[2,1]      v[3,1] 
       0.23        0.13        0.75        0.48        0.32        0.12 
     v[1,2]      v[2,2]      v[3,2]      v[1,3]      v[2,3]      v[3,3] 
       0.23        0.09        0.05        0.08        0.29        7.60 
     v[1,4]      v[2,4]      v[3,4]      v[1,5]      v[2,5]      v[3,5] 
       0.05        0.11        0.10        0.15        0.62        0.00 
     v[1,6]      v[2,6]      v[3,6]      v[1,7]      v[2,7]      v[3,7] 
       0.00        0.00        0.00        0.13        0.75        0.00 
     v[1,8]      v[2,8]      v[3,8]      v[1,9]      v[2,9]     v[1,10] 
       0.24        0.19        0.23        0.21        0.80        0.41 
    v[1,11]     v[1,12]     v[1,13]     v[1,14]  predk[1,2]  predk[2,2] 
       0.18        0.95        0.32        0.29          NA          NA 
 predk[3,2]  predk[1,3]  predk[2,3]  predk[3,3]  predk[1,4]  predk[2,4] 
         NA        5.30       -0.48        1.83        2.01        7.30 
 predk[3,4]  predk[1,5]  predk[2,5]  predk[3,5]  predk[1,6]  predk[2,6] 
       0.57        2.77        6.49        2.37        2.82        5.76 
 predk[3,6]  predk[1,7]  predk[2,7]  predk[3,7]  predk[1,8]  predk[2,8] 
       6.23       -4.78        7.10        5.12        3.10        0.95 
 predk[3,8]  predk[1,9]  predk[2,9] predk[1,10] predk[1,11] predk[1,12] 
      -0.34       -0.31       10.04        4.13        2.60        9.53 
predk[1,13] predk[1,14]    scale[1]    scale[2]    scale[3]        c[1] 
         NA       10.83          NA          NA        1.48        2.46 
       c[2]        c[3]    tau[1,1]    tau[2,1]    tau[3,1]    tau[1,2] 
       4.75        1.36          NA          NA       10.40          NA 
   tau[2,2]    tau[3,2]    tau[1,3]    tau[2,3]    tau[3,3]    tau[1,4] 
         NA        2.61          NA          NA       -2.68          NA 
   tau[2,4]    tau[3,4]    tau[1,5]    tau[2,5]    tau[3,5]    tau[1,6] 
         NA        1.35        8.25        1.19        0.14        0.11 
   tau[2,6]    tau[3,6]    tau[1,7]    tau[2,7]    tau[3,7]    tau[1,8] 
       0.08        0.10        0.09        0.22        0.76        4.85 
   tau[2,8]    tau[3,8]    tau[1,9]    tau[2,9]    tau[3,9]   tau[1,10] 
       0.70       16.36        1.66        1.59        0.05        3.98 
  tau[2,10]   tau[3,10]   tau[1,11]   tau[2,11]   tau[3,11]   tau[1,12] 
       0.07        0.05       53.40        0.08        9.30        0.08 
  tau[2,12]   tau[3,12]   tau[1,13]   tau[2,13]   tau[3,13]   tau[1,14] 
       0.18        0.10        0.12        0.87        0.08        0.09 
  tau[2,14]   tau[3,14] 
       5.40        0.04 ```

Thank you very much for your time and efforts. My real data has NA in different places rather than all at the right side. But I think your answer here indeed is a good start for me to understand what happen and what I should to. Thank you! — Kenny, Mar 15 '21 at 17:07
Ah, yeah missing data within the matrix (instead at the end of the row) would require some additional indexing within the for loops. This is made a bit trickier as you have an autoregressive model (response at time t depends on response at t-1). — mfidino, Mar 15 '21 at 17:23
Sorry for bothering you again. I applied this to my data (I made it as the same structure as the example here), but the sampling results still have number for 'v' in the trials where 'expectancy' and 'shock' are NA. Is it normal? May I show you the results in the stackoverflow chat? — Kenny, Mar 23 '21 at 11:43
It's likely because of this line `v[i,j] <- v[i,j-1] + alpha[i] * pe[i,j-1]`. So long as `v` is not `NA` in the first column you should likely end up with an estimate of `v`. — mfidino, Mar 23 '21 at 14:56
normally if the indexing is correct v would be 0 with trials with NA for expectancy and shock right? — Kenny, Mar 23 '21 at 17:14
Based on that line of code `v` would be a function of the value in at `j-1`, `alpha`, and `pe`. You used this code and then still have missing NA NOT on the right side, correct? — mfidino, Mar 23 '21 at 19:51
nono, I only have NA for `expectancy` and `shock` which are all on the right side. And the first column for `v` is 0 with the rest estimated by `v` in previous trial, `alpha`, and `pe`. Isn't it be the case if `[i,Ntirals[i]]` = e.g., [1,13], then we will only have 13 v instead of 14 v? — Kenny, Mar 24 '21 at 10:00
if `Ntrials[i] = 13`, then I would expect (given the above model) for there to be `v` values for 13 of them instead of 14, with the first value being `0`. The `v` matrix will have dimensions equal to `NSubjects` and the maximum value in the `Ntrials` vector, though I would expect there to be NA's for when there was no data. If there are estimated `v` values where there should not be then you need to look through your data as it is likely an issue there. — mfidino, Mar 24 '21 at 12:39
Thank you very much again for this further discussion. The last thing I want to confirm is, if everything is correct, JAGS should return `v` in trial 14 where there is no data as NA, even when I see 0 there should be something wrong right? — Kenny, Mar 24 '21 at 13:18
What JAGS returns is likely going to be associated to what JAGS evaluates under the hood. If it is returning a 0 for `v` in trial 14 when you know there is no data then to me it seems as if you just ignore it. — mfidino, Mar 24 '21 at 13:23

JAGS: Use cell matrix as data input for excluding NA

1 Answers1