0

I am reproducing in R some survival analysis results published in a journal. The original results were produced in Stata. Here are the original results:

Original Results

Here is the code to produce these results in R:

# load packages
library(dplyr)
library(foreign)
library(msm)
library(stargazer)

# load Svolik's original data 
data = read_stata("leaders, institutions, covariates, updated tvc.dta")

# set a t0 for each row
data = mutate(data,t0 = lag(t,default=0), .by=leadid)

# coup survival object original
survobj_coup = Surv(data[["t0"]], data[["_t"]], data$c_coup)

# coup model original
coups_original <- coxph(survobj_coup~legislature +  lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ communist+ mil+ cw+ age, 
      data=data, ties="breslow")

# revolt survival object original 
survobj_revolt = Surv(data[["t0"]], data[["_t"]], data$c_revolt)

# revolt model original 
revolt_original <- coxph(survobj_revolt~legislature +  lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ mil+ cw+ age, 
                        data=data, ties="breslow")

# natural survival object original
survobj_natural = Surv(data[["t0"]], data[["_t"]], data$c_natural)

# natural model original
natural_original <- coxph(survobj_natural~legislature +  lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ communist+ mil+ cw+ age, 
                        data=data, ties="breslow")

# Define a function to exponentiate coefficients
exp_coef <- function(x) {exp(x) }

# Create the table using stargazer
stargazer(natural_original, coups_original, revolt_original, apply.coef = exp_coef, p.auto = FALSE)

While I am able to produce the exact same coefficients (save for slight differences in rounding) with the exact same significance levels, the standard errors do not match. For example, in Model 1 in the figure (first column in Natural Causes), I obtain a standard error of 0.414 rather than 0.198 for the coefficient on Legislature (0.456*). I was reading that the differences may be due to how the standard errors are transformed (something to do with the delta method perhaps). Does anyone have any advice? Thanks.

Phil
  • 7,287
  • 3
  • 36
  • 66
w5698
  • 159
  • 7

1 Answers1

2

Quick answer is they measure different things. In your example, Stata reports approximate standard errors of HRs, while R reports SE of coefficient se(coef). I used an example from Stata manual:

library(survival)
library(webuse)

webuse("drugtr")
m_1 <- coxph(Surv(studytime, died) ~ drug + age, drugtr)
summary(m_1)

SE for 'drug' is .0477017 in Stata (SE for HR), but 0.46052 in R (SE for coefficient).

To get the same (or similar) SE values as Stata, you could borrow some functions from R packages. I used deltaMethod in car package.:

library(car)
deltaMethod(m_1, "exp(drug)")

Now it is 0.044426, similar to the value using Stata. There may be other alternatives.

Zhiqiang Wang
  • 6,206
  • 2
  • 13
  • 27