-1

I need to run the same exercise (Monte Carlo simulations) from Stata to R.

The codes I have used in Stata are the codes bellow. How can I do this using R? (I have searched for many tutorials, but I still didn't manage to do it in R).

* Simulations (10, 100 and 1000 sample replications/iterations)

clear
drop _all
set obs 100
set seed 54231
gen x = ((rnormal()))*10 + 40

* Generating true_y, considering Beta = 0,035
 
gen true_y = 5+0.03500*x
save truth, replace
twoway scatter true_y x

program hetero1
version 13
args c
use truth, clear
gen y = true_y + rnormal()
regress y x
end


foreach i in 10 100 1000 {
simulate _b, reps (`i'): hetero1
sum _b_x
twoway histogram _b_x, fraction xline(+0.03500, lcolor(red)) xline(`r(mean)', lcolor(green)) fcolor(none) lcolor(gs0) legend(off) title(`i' Repetições)
graph save graf`i'.gph, replace
}
 gr combine graf10.gph graf100.gph graf1000.gph
 graph export "graf.png", as(png) replace 

At the end, I need to obtain these 3 histograms (of estimated beta/coefficients), considering 10, 100 and 1000 sample replications. The red line refers to the "true" coefficient and the green one is the mean of the estimated coefficients - [see the image in the link]

enter image description here

DaveArmstrong
  • 18,377
  • 2
  • 13
  • 25
  • 1
    Mariana thank you for posting. I think it would be more helpful if you give the equations in a mathematical form and let us understand a bit the background. Your questions is interesting but hard to decode at the moment – LDT Jan 30 '21 at 22:45
  • @LDT thank you for your message/your help. Actually, this is part of a future exercise (which will be carried out with undergraduate students in public administration) about estimation - and about estimation bias (which is part of a code that I adapted now from DaveArmstrong's code/answer). As I am a very beginner in R, I really didn't know how to adapt the Stata code to R. – Mariana Costa Silveira Jan 31 '21 at 00:08
  • Hello @MarianaCostaSilveira maybe you can check also this approach https://stackoverflow.com/questions/64958902/equivalent-of-stata-command-simulate-in-r-for-montecarlo-simulation – Álvaro A. Gutiérrez-Vargas Jan 31 '21 at 19:07
  • I will check it; thank you very much for your help @LTD – Mariana Costa Silveira Feb 03 '21 at 18:14

1 Answers1

6

This should do it:

# Simulations (10, 100 and 1000 sample replications/iterations)
library(ggplot2)
library(dplyr)
library(gridExtra)
n <- 100
set.seed(54231)
x <- rnorm(n)*10 + 40

# Generating true_y, considering Beta = 0,035

true_y <- 5+0.03500*x

plot(x, true_y)

b <- t(replicate(1110, coef(lm(true_y + rnorm(n) ~ x))))

b <- as.data.frame(b) %>% 
  rename("a" = "(Intercept)", 
         "b" = "x") %>% 
  mutate(
    obs = 1:n(), 
    n = case_when(
      obs %in% 1:10 ~ "N = 10", 
      obs %in% 11:110 ~ "N = 100", 
      TRUE ~ "N = 1000"), 
    n = factor(n, levels=c("N = 10", "N = 100", "N = 1000")))
  

b10 <- b %>% filter(n == "N = 10") 
g1 <-   ggplot() + 
  geom_histogram(data = b10, aes(x=b), bins=3, col="white") + 
  geom_vline(xintercept = 0.03500, col="red") + 
  geom_vline(data = b10 %>% summarise(b=mean(b)), aes(xintercept = b), col="green") + 
  facet_wrap(~n) + 
  theme_bw()

b100 <- b %>% filter(n == "N = 100") 
g2 <-   ggplot() + 
  geom_histogram(data = b100, aes(x=b), bins=10, col="white") + 
  geom_vline(xintercept = 0.03500, col="red") + 
  geom_vline(data = b100 %>% summarise(b=mean(b)), aes(xintercept = b), col="green") + 
  facet_wrap(~n) + 
  theme_bw()

b1000 <- b %>% filter(n == "N = 1000") 
g3 <-   ggplot() + 
  geom_histogram(data = b1000, aes(x=b), bins=25, col="white") + 
  geom_vline(xintercept = 0.03500, col="red") + 
  geom_vline(data = b1000 %>% summarise(b=mean(b)), aes(xintercept = b), col="green") + 
  facet_wrap(~n) + 
  theme_bw()

library(gridExtra)
grid.arrange(g1, g2, g3, nrow=2)

enter image description here

DaveArmstrong
  • 18,377
  • 2
  • 13
  • 25
  • Perfect, your code helped me a lot, @DaveArmstrong! Thank you **very much** for your generous help! I also had to do "similar operations" (however, generating a Y with an error term dependent on X, to exemplify estimation problems) and it also worked.Thank you very much! – Mariana Costa Silveira Jan 30 '21 at 23:57