How can I simulate data so that the coefficients recovered by lm
are determined to be particular pre-determined values and have normally distributed residuals? For example, could I generate data so that lm(y ~ 1 + x)
will yield (Intercept) = 1.500
and x = 4.000
? I would like the solution to be versatile enough to work for multiple regression with continuous x
(e.g., lm(y ~ 1 + x1 + x2)
) but there are bonus points if it works for interactions as well (lm(y ~ 1 + x1 + x2 + x1*x2)
). Also, it should work for small N (e.g., N < 200).
I know how to simulate random data which is generated by these parameters (see e.g. here), but that randomness carries over to variation in the estimated coefficients, e.g., Intercept = 1.488
and x = 4.067
.
Related: It is possible to generate data that yields pre-determined correlation coefficients (see here and here). So I'm asking if this can be done for multiple regression?