I'm hoping any of you could shed some light on the following. I have been attempting to replicate a Cox PH model from Stata in R. As you can see below, I get the same results for Cox PH models without tvcs in both programs:
Stata Cox PH model
stset date_endpoint, failure(cause_endpoint2==4) exit(failure) origin(time capture_date) id(wolf_ID)
id: wolf_ID
failure event: cause_endpoint2 == 4
obs. time interval: (date_endpoint[_n-1], date_endpoint]
exit on or before: failure
t for analysis: (time-origin)
origin: time capture_date
--------------------------------------------------------------------------
5,664 total observations
0 exclusions
--------------------------------------------------------------------------
5,664 observations remaining, representing
513 subjects
231 failures in single-failure-per-subject data
279,430.5 total analysis time at risk and under observation
at risk from t = 0
earliest observed entry t = 0
last observed exit t = 3,051
stcox deer_hunt bear_hunt, strata(winter lib_kill) efron robust cluster(wolf_ID)
failure _d: cause_endpoint2 == 4
analysis time _t: (date_endpoint-origin)
origin: time capture_date
id: wolf_ID
Iteration 0: log pseudolikelihood = -993.65252
Iteration 1: log pseudolikelihood = -992.55768
Iteration 2: log pseudolikelihood = -992.55733
Refining estimates:
Iteration 0: log pseudolikelihood = -992.55733
Stratified Cox regr. -- Efron method for ties
No. of subjects = 513 Number of obs = 5,664
No. of failures = 231
Time at risk = 279430.5
Wald chi2(2) = 2.21
Log pseudolikelihood = -992.55733 Prob > chi2 = 0.3317
(Std. Err. adjusted for 513 clusters in wolf_ID)
--------------------------------------------------------------------------
| Robust
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf.Interval]
-------------+------------------------------------------------------------
deer_hunt | .7860433 .1508714 -1.25 0.210 .539596 1.145049
bear_hunt | 1.21915 .2687211 0.90 0.369 .7914762 1.877916
--------------------------------------------------------------------------
Stratified by winter lib_kill
R Cox PH model
> LTF.coxph <- coxph(Surv(`_t0`,`_t`, endpoint_r_enc=="ltf") ~ deer_hunt
+ + bear_hunt + strata(winter, lib_kill), data=statadta, ties = "efron", id = wolf_ID)
> summary(LTF.coxph)
Call:
coxph(formula = Surv(`_t0`, `_t`, endpoint_r_enc == "ltf") ~
deer_hunt + bear_hunt + strata(winter, lib_kill), data = statadta,
ties = "efron", id = wolf_ID)
n= 5664, number of events= 231
coef exp(coef) se(coef) z Pr(>|z|)
deer_hunt -0.2407 0.7860 0.1849 -1.302 0.193
bear_hunt 0.1982 1.2191 0.2174 0.911 0.362
exp(coef) exp(-coef) lower .95 upper .95
deer_hunt 0.786 1.2722 0.5471 1.129
bear_hunt 1.219 0.8202 0.7962 1.867
Concordance= 0.515 (se = 0.022 )
Likelihood ratio test= 2.19 on 2 df, p=0.3
Wald test = 2.21 on 2 df, p=0.3
Score (logrank) test = 2.22 on 2 df, p=0.3
> cox.zph(LTF.coxph)
chisq df p
deer_hunt 5.5773 1 0.018
bear_hunt 0.0762 1 0.783
GLOBAL 5.5773 2 0.062
The problem I have is that results look very different when adding a time-varying coefficient (tvc() in Stata and tt() in R) for one of the variables in my model. Nothing is the same between models (coefficients for all variables or their significance).
Stata Cox PH model with tvc()
stcox deer_hunt bear_hunt, tvc(deer_hunt) strata(winter lib_kill) efron robust cluster(wolf_ID)
failure _d: cause_endpoint2 == 4
analysis time _t: (date_endpoint-origin)
origin: time capture_date
id: wolf_ID
Iteration 0: log pseudolikelihood = -993.65252
Iteration 1: log pseudolikelihood = -990.70475
Iteration 2: log pseudolikelihood = -990.69386
Iteration 3: log pseudolikelihood = -990.69386
Refining estimates:
Iteration 0: log pseudolikelihood = -990.69386
Stratified Cox regr. -- Efron method for ties
No. of subjects = 513 Number of obs = 5,664
No. of failures = 231
Time at risk = 279430.5
Wald chi2(3) = 4.72
Log pseudolikelihood = -990.69386 Prob > chi2 = 0.1932
(Std. Err. adjusted for 513 clusters in wolf_ID)
--------------------------------------------------------------------------
| Robust
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf Interval]
-------------+------------------------------------------------------------
main |
deer_hunt | 1.043941 .2643779 0.17 0.865 .6354908 1.714915
bear_hunt | 1.204522 .2647525 0.85 0.397 .7829279 1.853138
-------------+------------------------------------------------------------
tvc |
deer_hunt | .9992683 .0004286 -1.71 0.088 .9984287 1.000109
------------------------------------------------------------------------------
Stratified by winter lib_kill
Note: Variables in tvc equation interacted with _t.
R Cox PH model with tt()
> LTF.tvc1.coxph <- coxph(Surv(`_t0`,`_t`, endpoint_r_enc=="ltf") ~ deer_hunt + bear_hunt + tt(deer_hunt) + strata(winter, lib_kill),
+ data=statadta, ties = "efron", id = wolf_ID, cluster = wolf_ID,
+ tt=function(x,t,...){x*t})
> summary(LTF.tvc1.coxph)
Call:
coxph(formula = Surv(`_t0`, `_t`, endpoint_r_enc == "ltf") ~
deer_hunt + bear_hunt + tt(deer_hunt) + strata(winter, lib_kill),
data = statadta, ties = "efron", tt = function(x, t, ...) {
x * t
}, id = wolf_ID, cluster = wolf_ID)
n= 5664, number of events= 231
coef exp(coef) se(coef) robust se z Pr(>|z|)
deer_hunt 0.4741848 1.6067039 0.2082257 0.2079728 2.280 0.02261 *
bear_hunt -0.7923208 0.4527927 0.1894531 0.1890497 -4.191 2.78e-05 ***
tt(deer_hunt)-0.0009312 0.9990692 0.0003442 0.0003612 -2.578 0.00994 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
deer_hunt 1.6067 0.6224 1.0688 2.4153
bear_hunt 0.4528 2.2085 0.3126 0.6559
tt(deer_hunt) 0.9991 1.0009 0.9984 0.9998
Concordance= 0.634 (se = 0.02 )
Likelihood ratio test= 28.29 on 3 df, p=3e-06
Wald test = 25.6 on 3 df, p=1e-05
Score (logrank) test = 26.19 on 3 df, p=9e-06, Robust = 32.6 p=4e-07
Moreover, I checked this post before posting this because I did not find it very helpful. The 'noadjust' Stata command was useful for SEs, but it does not answer my main issue of also getting different covariate coefficients between programs for the main and time-varying effects when I add those time-varying effects to the Cox model in each program (and the same formula for calculating the time-varying effects). That is really my main concern: the difference in covariate estimates seems substantial and would result in different prescriptions, I believe
I have been unable to figure out what is happening there, and am hoping the community can help.