0

I am trying to create many permutations of my data, but to preserve my stratified design. I need to model each randomized data set and then extract coefficients.

I tried to use the gtools package's permute(), but it does not stratify as I need it to. The permute package's shuffleSet() claims to work, but I cannot find any documentation about how to use a permutationMatrix for modeling. I have resorted to a for loop:

library(permute)
blks <- as.factor(df$block)
plts <- as.factor(df$plot)
CTRL <- how(within = Within(type = "free"), plots = Plots(strata = plts), blocks = blks) # set the way in which permute approaches the data


set.seed(1717)
no.perm <- 100 # set the number of permutations
random_model <- data.frame() # create a place to hold the result
for (i in 1:no.perm) {
  shuffled <- shuffle(nrow(df), control = CTRL) # permute the data according to CTRL design
  df_shuffled <- df[shuffled,] # since shuffle() returns integers, retrieve the data
  coefs <- summary(clogit(response ~ pred1 + pred2 + pred3 + strata(plot),data = df_shuffled))$coefficients # model and extract summary
  random_model <- rbind(random_model, coefs) # add to the results
}

If I run the shuffle() line independently, I get a different result each time. However, the whole loop returns the same three coefficients 100 times. I am not sure where I am going wrong, but is there a way to get my loop to model each permuted data set and return a summary?

Thanks so much!

Tess H
  • 25
  • 4
  • If you create `shuffled <- shuffleSet(np.perm, control = curl)` outside the loop, you can do `df_shuffled <- df[shuffled[, i], ]` to iterate over the set of indices. If you create a reproducible example (sample some random values for `pred` etc and for `response`, create factors for `plot` and `block`, I can take a closer look for you. But it looks like you are shuffling the entire `df` which will shuffle the predictor variables and the response, so basically just recording the data within `plots` and `blocks` and as order doesn't for the model you get the same coefs. – Gavin Simpson Jun 03 '21 at 05:23
  • You should just shuffle the response variable. Outside the loop do `df_shuffled <- df` to take a copy. Then in the loop do: `df_shuffled[["response"]] <- df_shuffled[shuffled[, i], "response"]` to reorder the response and store it back into `df_shuffled` which you pass to `clogit()`. Now you have broken the relationship between `response` and your covariates, while preserving the blocking structure of your data. – Gavin Simpson Jun 03 '21 at 05:27
  • "recording" -> "reordering" and "order doesn't for" -> "order doesn't matter for" – Gavin Simpson Jun 03 '21 at 05:29

0 Answers0