Problems with the reproduction in R of survival analysis results originally generated in Stata?

Question

I am attempting to reproduce some survival analysis results published in a journal. The original results were produced in Stata. Here is the code:

* COUPS
gen c_coup=c
replace c_coup=0 if exit!="coup"
stset time, id(leadid) failure(c_coup)

* REVOLT ENTRY LEFT OUT BECAUSE IT IS A PERFECT PREDICTOR
streg  legislature  leg_growth_2 gdp_1k chgdpen_fearonlaitin Oil_fearonlaitin postcoldwarlag civiliandictatorshiplag militarydictatorshiplag communist lpopl1_fearonlaitin ethfrac_fearonlaitin relfrac_fearonlaitin  age, distribution(weibull) time
outreg2 using survival, replace ctitle(coups, partial) tex nonotes bdec(3) e(all)
stcurve, hazard

I ran the code in Stata and the results are identical to those in the published manuscript. I am now trying to reproduce them in R, but I have had no luck. The results are off by quite a bit, suggesting that the differences are not due to using R as opposed to Stata. Here is the R code I wrote:

# load required libraries
library(survival)
library(haven)

# load data 
leader_tvc_2 <- read_dta("leader_tvc_2.dta", encoding = "latin1")

# create survival object 
surv_obj_coup <-
  Surv(time = leader_tvc_2$time, event = leader_tvc_2$c_coup)

# fit a survival regression model in R
surv_model <- survreg(
  surv_obj_coup ~ legislature + leg_growth_2 + gdp_1k + chgdpen_fearonlaitin + Oil_fearonlaitin + postcoldwarlag + civiliandictatorshiplag + militarydictatorshiplag + communist + lpopl1_fearonlaitin + ethfrac_fearonlaitin + relfrac_fearonlaitin + age,
  data = leader_tvc_2,
  dist = "weibull"
)

# summarize results
summary(surv_model)

Does anyone know why I am not getting roughly the same results? Am I not implementing the Stata code correctly in R? Any advice would be appreciated! Thanks.

You should make your example reproducible either by including a subset of your data with `dput()` or by using a dataset that is available in R and Stata. — bretauv, Apr 13 '23 at 15:01
I agree with @bretauv. Aditionally, you should look into how stata and R import your data. If you have any strings columns, results will change if you set them as factors, or as ordered factors. But i think that should not be a problem if you use only numerical variables — juan_bzt, Apr 13 '23 at 18:00
To reproduce results from a published manuscript, it would be helpful to cite the publication. Those who may try to help will be able to access the data. — Zhiqiang Wang, Apr 17 '23 at 00:39

Problems with the reproduction in R of survival analysis results originally generated in Stata?

0 Answers0