How to write a loop to simulate sampling distribution of t-statistic under null using a true model?

Question

What I currently have a problem with this problem is understanding how to fimulate 10,000 draws and fix the covariates.

Y
<int>
X1
<dbl>
X2
<dbl>
X3
<int>
1   4264    305.657 7.17    0
2   4496    328.476 6.20    0
3   4317    317.164 4.61    0
4   4292    366.745 7.02    0
5   4945    265.518 8.61    1
6   4325    301.995 6.88    0
6 rows

That is the head of the grocery code.

What I've done so far for other problems related:

#5.
#using beta_hat
#create a matrix with all the Xs and numbers from 1-52
X <- cbind(rep(1,52), grocery$X1, grocery$X2, grocery$X3)
beta_hat <- solve((t(X) %*% X)) %*% t(X) %*% grocery$Y
round(t(beta_hat), 2)

#using lm formula and residuals
#lm formula
lm0 <- lm(formula = Y ~ X1 + X2 + X3, data = grocery)

#6.
residuals(lm0)[1:5]

Below is what the lm() in the original function:

Call:
lm(formula = Y ~ X1 + X2 + X3, data = grocery)

Residuals:
    Min      1Q  Median      3Q     Max 
-264.05 -110.73  -22.52   79.29  295.75 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 4149.8872   195.5654  21.220  < 2e-16 ***
X1             0.7871     0.3646   2.159   0.0359 *  
X2           -13.1660    23.0917  -0.570   0.5712    
X3           623.5545    62.6409   9.954 2.94e-13 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 143.3 on 48 degrees of freedom
Multiple R-squared:  0.6883,    Adjusted R-squared:  0.6689 
F-statistic: 35.34 on 3 and 48 DF,  p-value: 3.316e-12

The result should be a loop that can do the sampling distribution in the t test. Right now what I have is for another problem that focuses on fitting the model based on the data.

Here I'm given the true model (for the true hypothesis) but not sure where to begin with the loop.

I've made edits so the code is more readable and included image of the question since the symbols came out weird. If anyone can provide guidance on the first step, much appreciated. — cookiemonster3009, May 06 '19 at 21:30
So, you have X1, X2, and X3. You can use the true parameters for beta1, beta2 and beta3 to generate some Y*. You add random noise epsilon to Y* to get Y. Then you ran the regression. The whole thing you repeat 10000 times (and in each iteration, you get new random noise). Where exactly do you get stuck? — coffeinjunky, May 06 '19 at 22:29
@coffeinjunky this was helpful, but where are you getting the random noise epsilon? I know X1 X2 and X3 is pulled from the dataset. — cookiemonster3009, May 07 '19 at 01:07
According to your equation 5, the random noise is normally distributed with the given mean and variance. Just draw it from the normal distribution. — coffeinjunky, May 07 '19 at 07:01
If the below answers your question, please consider accepting the answer by clicking on the appropriate button next to the start of the answer. This is so that others can see that this is no longer an open issue. If it is, please clarify what remains unclear. — coffeinjunky, May 16 '19 at 14:02

score 0 · Answer 1 · answered May 07 '19 at 07:16

Okay, have a look at the following:

# get some sample data:
set.seed(42)
df <- data.frame(X1 = rnorm(10), X2 = rnorm(10), X3 = rbinom(10, 1, 0.5))
# note how X1 gets multiplied with 0, to highlight that the null is imposed.
df$y_star <- with(df, 4200 + 0*X1 - 15*X2 + 620 * X3)
head(df)
            X1         X2 X3   y_star
1   1.37095845  1.3048697  0 4180.427
2  -0.56469817  2.2866454  0 4165.700
3   0.36312841 -1.3888607  0 4220.833
4   0.63286260 -0.2787888  1 4824.182
5   0.40426832 -0.1333213  0 4202.000

# define function to get the t statistic
get_tstat <- function(){
  # declare the outcome, with random noise added:
  # The added random noise here will be different in each draw
  df$y <- with(df, y_star + rnorm(10, mean = 0, sd = sqrt(20500)))
  # run linear model
  mod <- lm(y ~ X1 + X2 + X3, data = df)
  return(summary(mod)$coefficients["X1", "t value"])
}

# get 10 values from the t-statistic:
replicate(10, get_tstat())
 [1] -0.8337737 -1.2567709 -1.2303073  0.3629552 -0.1203216 -0.1150734  0.3533095  1.6261360
 [9]  0.8259006 -1.3979176

How to write a loop to simulate sampling distribution of t-statistic under null using a true model?

1 Answers1