6

I apologize in advance if my question seems really simplistic or naive, but I'm trying to understand what the function simulate does, conceptually (i.e., I'm interested in the logic of it, independently of whether it's applied to lm, lme, etc.)

Say I'm doing a simple multiple regression on the following data:

n <- 40

x1 <- rnorm(n, mean=3, sd=1)

x2 <- rnorm(n, mean=4, sd=1.25)

y <- 2*x1 + 3*x2 + rnorm(n, mean=2, sd=1)

mydata <- data.frame(x1, x2, y)

mod <- lm(y ~ x1 + x2, data=mydata)

What does the function simulate do when applied to that case? So if I do:

simulate(mod, nsim=2)

What are the two vectors I obtain?

In essence, is it similar to doing:

replicate(2, y + rnorm(n=length(y), mean="some value", sd="some other value"))

If it is similar to that logic, then what would "some value" and "some other value" be? Would they be mean(mod$residuals) and sd(mod$residuals)? Or a permutation of actual residuals? Or something else entirely?

Or is it doing something completely different?

If anyone could explain/confirm how simulate works in simple, non-technical terms, it would be greatly appreciated.

Alex A.
  • 5,466
  • 4
  • 26
  • 56
msoftrain
  • 1,017
  • 8
  • 23
  • 1
    You can take a look at the source code using `stats:::simulate.lm`. But since this is more about statistical concepts, I might recommend [Cross Validated](http://stats.stackexchange.com) over Stack Overflow. – Alex A. Mar 30 '15 at 15:43

1 Answers1

4

It basically does what the help file says: 'Simulate one or more responses from the distribution corresponding to a fitted model object.'

So for each simulation a random draw is taken from the conditional distribution of the outcome variable conditional on the covariates. This conditional distribution is a normal distribution by default in lm. The standard deviation of this normal distribution corresponds to the sqrt of MSE of mod.

The code below replicates the output (given you use the same seed):

set.seed(1)
head(simulate(mod, nsim=2))

set.seed(1)
for(i in 1:nsim) {
  tmp <- predict(mod) + rnorm(length(predict(mod)), 0, summary(mod)$sigma)
  res <- if (i==1) tmp else cbind(res, tmp)
}
head(res)
johansteen
  • 318
  • 1
  • 9
  • In the case of `family = 'poisson'` the procedure is the same with rounding to the integer? – Barbab Jan 02 '22 at 10:33