1

I'm hoping any of you could shed some light on the following. I have been attempting to replicate a Cox PH model from Stata in R. As you can see below, I get the same results for Cox PH models without tvcs in both programs:

Stata Cox PH model

stset date_endpoint, failure(cause_endpoint2==4) exit(failure) origin(time capture_date) id(wolf_ID)

                id:  wolf_ID
     failure event:  cause_endpoint2 == 4
obs. time interval:  (date_endpoint[_n-1], date_endpoint]
 exit on or before:  failure
    t for analysis:  (time-origin)
            origin:  time capture_date

--------------------------------------------------------------------------
      5,664  total observations
          0  exclusions
--------------------------------------------------------------------------
      5,664  observations remaining, representing
        513  subjects
        231  failures in single-failure-per-subject data
  279,430.5  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         0
                                          last observed exit t =     3,051

stcox deer_hunt bear_hunt, strata(winter lib_kill) efron robust cluster(wolf_ID)

         failure _d:  cause_endpoint2 == 4
   analysis time _t:  (date_endpoint-origin)
             origin:  time capture_date
                 id:  wolf_ID

Iteration 0:   log pseudolikelihood = -993.65252
Iteration 1:   log pseudolikelihood = -992.55768
Iteration 2:   log pseudolikelihood = -992.55733
Refining estimates:
Iteration 0:   log pseudolikelihood = -992.55733

Stratified Cox regr. -- Efron method for ties

No. of subjects      =          513         Number of obs    =       5,664
No. of failures      =          231
Time at risk         =     279430.5
                                            Wald chi2(2)     =        2.21
Log pseudolikelihood =   -992.55733         Prob > chi2      =      0.3317

                         (Std. Err. adjusted for 513 clusters in wolf_ID)
--------------------------------------------------------------------------
             |               Robust
          _t | Haz. Ratio   Std. Err.   z    P>|z|   [95% Conf.Interval]
-------------+------------------------------------------------------------
   deer_hunt | .7860433   .1508714    -1.25   0.210     .539596   1.145049
   bear_hunt | 1.21915    .2687211     0.90   0.369     .7914762  1.877916
--------------------------------------------------------------------------
                                             Stratified by winter lib_kill

R Cox PH model

> LTF.coxph <- coxph(Surv(`_t0`,`_t`, endpoint_r_enc=="ltf") ~ deer_hunt 
+                        + bear_hunt + strata(winter, lib_kill), data=statadta, ties = "efron", id = wolf_ID)
> summary(LTF.coxph)
Call:
coxph(formula = Surv(`_t0`, `_t`, endpoint_r_enc == "ltf") ~ 
    deer_hunt + bear_hunt + strata(winter, lib_kill), data = statadta, 
    ties = "efron", id = wolf_ID)

  n= 5664, number of events= 231 

             coef exp(coef) se(coef)      z Pr(>|z|)
deer_hunt -0.2407    0.7860   0.1849 -1.302    0.193
bear_hunt  0.1982    1.2191   0.2174  0.911    0.362

          exp(coef) exp(-coef) lower .95 upper .95
deer_hunt     0.786     1.2722    0.5471     1.129
bear_hunt     1.219     0.8202    0.7962     1.867

Concordance= 0.515  (se = 0.022 )
Likelihood ratio test= 2.19  on 2 df,   p=0.3
Wald test            = 2.21  on 2 df,   p=0.3
Score (logrank) test = 2.22  on 2 df,   p=0.3

> cox.zph(LTF.coxph)
           chisq df     p
deer_hunt 5.5773  1 0.018
bear_hunt 0.0762  1 0.783
GLOBAL    5.5773  2 0.062

The problem I have is that results look very different when adding a time-varying coefficient (tvc() in Stata and tt() in R) for one of the variables in my model. Nothing is the same between models (coefficients for all variables or their significance).

Stata Cox PH model with tvc()

stcox deer_hunt bear_hunt, tvc(deer_hunt) strata(winter lib_kill) efron robust cluster(wolf_ID)

         failure _d:  cause_endpoint2 == 4
   analysis time _t:  (date_endpoint-origin)
             origin:  time capture_date
                 id:  wolf_ID

Iteration 0:   log pseudolikelihood = -993.65252
Iteration 1:   log pseudolikelihood = -990.70475
Iteration 2:   log pseudolikelihood = -990.69386
Iteration 3:   log pseudolikelihood = -990.69386
Refining estimates:
Iteration 0:   log pseudolikelihood = -990.69386

Stratified Cox regr. -- Efron method for ties

No. of subjects      =          513         Number of obs    =       5,664
No. of failures      =          231
Time at risk         =     279430.5
                                            Wald chi2(3)     =        4.72
Log pseudolikelihood =   -990.69386         Prob > chi2      =      0.1932

                          (Std. Err. adjusted for 513 clusters in wolf_ID)
--------------------------------------------------------------------------
             |               Robust
          _t | Haz. Ratio   Std. Err.    z    P>|z|    [95% Conf Interval]
-------------+------------------------------------------------------------
main         |
   deer_hunt | 1.043941   .2643779     0.17   0.865     .6354908  1.714915
   bear_hunt | 1.204522   .2647525     0.85   0.397     .7829279  1.853138
-------------+------------------------------------------------------------
tvc          |
   deer_hunt | .9992683   .0004286    -1.71   0.088     .9984287  1.000109
------------------------------------------------------------------------------
                                             Stratified by winter lib_kill
Note: Variables in tvc equation interacted with _t.

R Cox PH model with tt()

> LTF.tvc1.coxph <- coxph(Surv(`_t0`,`_t`, endpoint_r_enc=="ltf") ~ deer_hunt + bear_hunt + tt(deer_hunt) + strata(winter, lib_kill), 
+                    data=statadta, ties = "efron", id = wolf_ID, cluster = wolf_ID, 
+                    tt=function(x,t,...){x*t})
> summary(LTF.tvc1.coxph)
Call:
coxph(formula = Surv(`_t0`, `_t`, endpoint_r_enc == "ltf") ~ 
    deer_hunt + bear_hunt + tt(deer_hunt) + strata(winter, lib_kill), 
    data = statadta, ties = "efron", tt = function(x, t, ...) {
        x * t
    }, id = wolf_ID, cluster = wolf_ID)

  n= 5664, number of events= 231 

                    coef  exp(coef)   se(coef)  robust se      z Pr(>|z|)    
deer_hunt    0.4741848  1.6067039  0.2082257 0.2079728  2.280  0.02261 *  
bear_hunt   -0.7923208  0.4527927  0.1894531 0.1890497 -4.191 2.78e-05 ***
tt(deer_hunt)-0.0009312 0.9990692 0.0003442  0.0003612 -2.578  0.00994 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

              exp(coef) exp(-coef) lower .95 upper .95
deer_hunt        1.6067     0.6224    1.0688    2.4153
bear_hunt        0.4528     2.2085    0.3126    0.6559
tt(deer_hunt)    0.9991     1.0009    0.9984    0.9998

Concordance= 0.634  (se = 0.02 )
Likelihood ratio test= 28.29  on 3 df,   p=3e-06
Wald test            = 25.6  on 3 df,   p=1e-05
Score (logrank) test = 26.19  on 3 df,   p=9e-06,   Robust = 32.6  p=4e-07

Moreover, I checked this post before posting this because I did not find it very helpful. The 'noadjust' Stata command was useful for SEs, but it does not answer my main issue of also getting different covariate coefficients between programs for the main and time-varying effects when I add those time-varying effects to the Cox model in each program (and the same formula for calculating the time-varying effects). That is really my main concern: the difference in covariate estimates seems substantial and would result in different prescriptions, I believe

I have been unable to figure out what is happening there, and am hoping the community can help.

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
  • Does this answer your question? [Cox proportional hazard model in R vs Stata](https://stackoverflow.com/questions/33966004/cox-proportional-hazard-model-in-r-vs-stata) – TarJae Mar 11 '21 at 22:53
  • Hi TarJae, thanks for the response. I checked that post before posting this because I did not find it helpful. The 'noadjust' STATA command was useful for SEs, but it does not answer my issue of also getting different covariate coefficients between programs for the main and time-varying effects when I add those time-varying effects to the Cox model in each program (and the same formula for calculating the time-varying effects). That is really my main concern: the difference in covariate estimates seems substantial and would result in different prescriptions, I believe. – Francisco J. Santiago-Ávila Mar 14 '21 at 17:57
  • 1
    I think this needs an explicit data example readable in both R and Stata, full code and full results in both cases. Asking people to look at separate screenshots of results and hold the details in their heads to try to work out what is going on appears to be too optimistic. Everything has to be readable in one place. – Nick Cox Mar 14 '21 at 18:16
  • Thank you for the suggestions, @NickCox, and apologies for not doing that earlier. I changed the screenshots to code. I'd be happy to provide the data in both formats but I'm unable to figure out how to at the moment. Any assistance on that would be much appreciated. Thanks again for taking the time, and I hope the edits help. – Francisco J. Santiago-Ávila Mar 15 '21 at 16:34
  • Thanks for improving the presentation. It now needs someone to look at this who has experience in both programs with the method, which doesn't include me. – Nick Cox Mar 15 '21 at 16:38

0 Answers0