5

I´m trying to replicate in R a cox proportional hazard model estimation from Stata using the following data http://iojournal.org/wp-content/uploads/2015/05/FortnaReplicationData.dta

The command in stata is the following:

stset enddate2009, id(VPFid) fail(warends) origin(time startdate)
stcox HCTrebels o_rebstrength demdum independenceC transformC lnpop lngdppc africa diffreligion warage if keepobs==1, cluster(js_country)

Cox regression -- Breslow method for ties

No. of subjects      =          104                Number of obs   =       566
No. of failures      =           86
Time at risk         =       194190
                                               Wald chi2(10)   =     56.29
Log pseudolikelihood =   -261.94776                Prob > chi2     =    0.0000

                           (Std. Err. adjusted for 49 clusters in js_countryid)
-------------------------------------------------------------------------------
              |               Robust
           _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
    HCTrebels |   .4089758   .1299916    -2.81   0.005     .2193542    .7625165
o_rebstrength |   1.157554   .2267867     0.75   0.455     .7884508    1.699447
       demdum |   .5893352   .2353317    -1.32   0.185     .2694405    1.289027
independenceC |   .5348951   .1882826    -1.78   0.075      .268316    1.066328
   transformC |   .5277051   .1509665    -2.23   0.025     .3012164    .9244938
        lnpop |   .9374204   .0902072    -0.67   0.502     .7762899    1.131996
      lngdppc |   .9158258   .1727694    -0.47   0.641     .6327538    1.325534
       africa |   .5707749   .1671118    -1.92   0.055     .3215508    1.013165
 diffreligion |   1.537959   .4472004     1.48   0.139      .869834    2.719275
       warage |   .9632408   .0290124    -1.24   0.214     .9080233    1.021816
-------------------------------------------------------------------------------

With R, I´m using the following:

data <- read.dta("FortnaReplicationData.dta")
data4 <- subset(data, keepobs==1)
data4$end_date <- data4$`_t`
data4$start_date <- data4$`_t0`
levels(data4$o_rebstrength) <- c(0:4)
data4$o_rebstrength <- as.numeric(levels(data4$o_rebstrength[data4$o_rebstrength])
data4 <- data4[,c("start_date", "end_date","HCTrebels",  "o_rebstrength", "demdum", "independenceC", "transformC", "lnpop", "lngdppc", "africa", "diffreligion", "warage", "js_countryid", "warends")]
data4 <- na.omit(data4)
surv <- coxph(Surv(start_date, end_date, warends) ~ HCTrebels+ o_rebstrength +demdum + independenceC+ transformC+ lnpop+ lngdppc+ africa +diffreligion+ warage+cluster(js_countryid), data = data4, robust = TRUE, method="breslow")

                 coef exp(coef) se(coef) robust se     z      p
HCTrebels     -0.8941    0.4090   0.3694    0.3146 -2.84 0.0045
o_rebstrength  0.1463    1.1576   0.2214    0.1939  0.75 0.4505
demdum        -0.5288    0.5893   0.4123    0.3952 -1.34 0.1809
independenceC -0.6257    0.5349   0.3328    0.3484 -1.80 0.0725
transformC    -0.6392    0.5277   0.3384    0.2831 -2.26 0.0240
lnpop         -0.0646    0.9374   0.1185    0.0952 -0.68 0.4974
lngdppc       -0.0879    0.9158   0.2060    0.1867 -0.47 0.6377
africa        -0.5608    0.5708   0.3024    0.2898 -1.94 0.0530
diffreligion   0.4305    1.5380   0.3345    0.2878  1.50 0.1347
warage        -0.0375    0.9632   0.0405    0.0298 -1.26 0.2090

Likelihood ratio test=30.1  on 10 df, p=0.000827
n= 566, number of events= 86 

I get the same hazard ratio coefficients but the standard errors does not look the same. The Z and p values are close but not exactly the same. Why might be the difference between the results in R and Stata?

user2246905
  • 1,029
  • 1
  • 12
  • 31
  • a couple of comments (most likely unhelpful) . For the R results, the asymptotic and robust se are pretty close, which i tend to find reassuring, and the z-statistic can be seen to be calculated from coef / rob.se. I cannot seem to calculate the z-stat from the stata results (log(HR) / rob.se isnt it) - do you know why/how? Suggests the st.errors have been transformed maybies?? – user20650 Nov 28 '15 at 01:10
  • I think in someway the se may be transformed but I really don´t have a clear idea of how or if they really are transformed. – user2246905 Nov 28 '15 at 01:22
  • Im guessing wildly but did you try specifying `nohr` to your stata code.. – user20650 Nov 28 '15 at 01:24
  • Typing nohr gives me standard errors pretty close to that shown in R but still there is a small difference. This are the standard errors shown in stata HCTrebels | .3178468 o_rebstrength | .195919 demdum | .3993173 independenceC | .3519991 transformC | .2860811 – user2246905 Nov 28 '15 at 01:27
  • Yup, a lot closer, and we can at least calculate the z-stat. My feeling also that they are a bit too different to be due to the algorithm, which suggets model specification. Im not familiar with stata so cant help .. sorry .(Did you compare the results without specifying the clusters??) – user20650 Nov 28 '15 at 01:32
  • I just did it, and using nohr I get exactly the same results in the standard errors. So the difference is only when specifying clusters. Do you know exactly which transformation to the standard errors is used so that I could get the same standard errors in R as when not specifying nohr in Stata? – user2246905 Nov 28 '15 at 01:46
  • RE you last comment, nope, but id generally always get these results on the log scale... so you can calculate z-scores / confidence intervals etc – user20650 Nov 28 '15 at 01:58
  • 1
    haha... got it !!! dug out an old lappie with stata.. add `noadjust`. [the manual](https://www.google.co.uk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwjHqvrVibLJAhXFfhoKHZOXA3sQFggeMAA&url=http%3A%2F%2Fwww.stata.com%2Fmanuals13%2Fststcox.pdf&usg=AFQjCNH9WpjLI6OHK0f1KdwWTE_yD5Eh7A) says a few words – user20650 Nov 28 '15 at 02:30
  • Thank you, now it is exactly the same. Do you know a way I could do that standard degree-of-freedom adjustment in R? – user2246905 Nov 28 '15 at 02:53
  • 1
    Using the adjust formula from pg3 in the manual ... `sqrt(diag(vcov(surv))* (49/48))` - might be worth automating the number of clusters – user20650 Nov 28 '15 at 03:15

1 Answers1

4

As user20650 noticed, when including "nohr" in the Stata options you get exactly the same standard errors as in R. Still there was a small difference in the standard errors when using clusters. user20650 again noticed that the difference was given because Stata default standard errors are multiplied g/(g − 1), where g is the number of cluster while R does not adjust these standard errors. So a solution is just to include noadjust in Stata or have the standard errors adjusted in R by doing:

sqrt(diag(vcov(surv))* (49/48))

If still we want in R to have the same standard errors from Stata, as when not specifying nohr, we need to know that when nhr is left off we obtain $exp(\beta)$ with the standard errors resulting from fitting the model in those scale. In particular obtained by applying the delta method to the original standard-error estimate. "The delta method obtains the standard error of a transformed variable by calculating the variance of the corresponding first-order Taylor expansion, which for the transform $exp(\beta)$ amounts to mutiplying the oringal standard error by $exp(\hat{\beta})$. This trick of calculation yields identical rsults as does transforming the parameters prior to estimation and then reestimating" (Cleves et al 2010). In R we can do it by using:

library(msm)
se <-diag(vcov(surv)* (49/48))
sapply(se, function(x) deltamethod(~ exp(x1), coef(surv)[which(se==x)], x))

     HCTrebels o_rebstrength    demdum independenceC transformC     lnpop   lngdppc    africa diffreligion     warage
     0.1299916     0.2267867 0.2353317     0.1882826  0.1509665 0.0902072 0.1727694 0.1671118    0.4472004 0.02901243
user2246905
  • 1,029
  • 1
  • 12
  • 31
  • Many thanks, very useful for me. i have the standard errors (0.7), HR(1.88) from STATA, however, as i don't have the data, how could i use R to get the standard error as in R. The number of cluster is 182. – user2669497 Oct 30 '16 at 06:21
  • i have came up using "(SE/HR)*(g-1/g)" to directly calculate SE from STATA into R SE. Using the HCTrebels as an example, (0.1299916/0.4089758)*(48/49)=0.31136, which is quite close to 0.3146 in R. – user2669497 Nov 01 '16 at 03:00