5

I need to add columns of predicted hazard ratio in the dataframe after running Cox PH regression in R. The dataframe is a panel data where numgvkey if firm identifier and age is time identifier. You can download a small section of the date from this link: https://drive.google.com/file/d/0B8usDJAPeV85VFRWd01pb0h1MDA/view?usp=sharing

I have don the following:

library(survival)
library(readstata13)
sme <- read.dta13("sme.dta")
reg<-coxph(Surv(age,EVENT2)~L1FETA+frailty(numgvkey), ties=c("efron"),  data=sme)
summary(reg)
hr <- predict(reg, type="risk")

How can I add a 5th column of "Hazard Ratio"(hr) in my 'sme' dataframe? Also, is there any way to predict the EVENT2 probability rather than 'hr'?

Frank
  • 66,179
  • 8
  • 96
  • 180
Jairaj Gupta
  • 347
  • 4
  • 16

1 Answers1

6

The predict.coxph function allows you to generate several different "type" of output. One of them is "expected" which may be what you mean by "probability". It's not really a probability since the numbers sometimes exceed 1.0 when the relative risk, "baseline hazard" and times under observation are high.

The "risk" option for "type" returns the hazard ratio.

There is a survfit.coxph which allows one to calculate predicted survival. The object it returns has both surv and a cumhaz list components.

You might want to try this:

sme$cumhaz <- survfit(fit, newdata=sme)$cumhaz
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • thanks for your answer. What about multiple event coxph models, for example, modeling patients becoming ill and becoming ill again? Using `survfit(fit, newdata=sme)$cumhaz`, can we interpret the output as 'has become sick again' when cumhaz is >2? – NickBraunagel Sep 14 '19 at 06:41
  • 1
    I don't see it as feasible unless you take the position that the time following each event is independent of the time preceding it. Doesn't sound biologically plausible that there would be such independence. If you were able to make that case, then you would need to have separate lines of data for each time to event episode. You would also need to cluster. I'd be thinking about using Poisson regression instead. – IRTFM Sep 14 '19 at 09:08
  • Any chance you can answer this question? It is related to multi-event survival analysis. https://stats.stackexchange.com/questions/427215/r-correctly-interpret-survival-curve-for-multiple-event-coxph-model – NickBraunagel Sep 14 '19 at 16:48
  • 1
    I took a shot at it. – IRTFM Sep 14 '19 at 19:01
  • did you delete your answer / comment? I don't see anything on my question on cross validated – NickBraunagel Sep 14 '19 at 19:06
  • No. It's still there for me. Perhaps you need to "reload" the page? – IRTFM Sep 14 '19 at 19:10