1

I would like to analysis my data based on the gradient boosted model.

On the other hand, as my data is a kind of cohort, I have a trouble understanding the result of this model.

Here's my code. Analysis was performed based on the example data.

install.packages("randomForestSRC")
install.packages("gbm")
install.packages("survival")

library(randomForestSRC)
library(gbm)
library(survival)

data(pbc, package="randomForestSRC")
data <- na.omit(pbc)

set.seed(9512)
train <- sample(1:nrow(data), round(nrow(data)*0.7))
data.train <- data[train, ]
data.test <- data[-train, ]

set.seed(9741)
gbm <- gbm(Surv(days, status)~.,
           data.train,
           interaction.depth=2,
           shrinkage=0.01,
           n.trees=500,
           distribution="coxph")

summary(gbm)


set.seed(9741)
gbm.pred <- predict.gbm(gbm, 
                        n.trees=500,
                        newdata=data.test, 
                        type="response") 

As I read the package documnet, "gbm.pred" is the result of cox's partial likelihood.

set.seed(9741)
lambda0 = basehaz.gbm(t=data.test$days, 
                      delta=data.test$status,  
                      t.eval=sort(data.test$days), 
                      cumulative = FALSE, 
                      f.x=gbm.pred, 
                      smooth=T)

hazard=lambda0*exp(gbm.pred)

In this code, lambda0 is a baseline hazard fuction.

So, according to formula: h(t/x)=lambda0(t)*exp(f(x))

"hazard" is hazard function.

However, what I've wanted to calculte was the "survival function".

Because, I would like to compare the outcome of original data (data$status) to the prediction result (survival function).

Please let me know how to calculate survival function.

Thank you

SJUNLEE
  • 167
  • 2
  • 14
  • 1
    The survival function is 1 minus the cumulative hazard function. (You are probably getting "hazard" predictions for each case. What you want are predictions at a sequence of times.) – IRTFM Sep 20 '18 at 07:28
  • 1
    Also the help page for `predict.gbm` says type="response only gives you backtransformed values for bernoulli and poisson. – IRTFM Sep 20 '18 at 07:55
  • @42-Thank you so much for your reply. The package document says "for the other distributions (except for bernoulli and poisson) "response" and "link" returen the same". Therefore, I think there is no option to change... However, as I'm a newbie in this field, could you tell me what the backtransformed values are shortly? I considered that it is a kind of log-transformed... but I'm not sure. – SJUNLEE Sep 20 '18 at 11:23
  • 1
    When you use `exp( fitted values)` you are "back transforming" the regression fit to get relative risks, since you used a log link when doing Poisson regression. That's why you were executing the code: `exp(gbm.pred)` since the results of the Cox regression were delivered as log-hazard. I think you also should be using cumulative=TRUE if you want to convert to survival (see my first comment). I haven't been a user of this package and have still not gotten its machinery to deliver what looks like a sensible cumulative hazard, hence no attempt at an answer, just comments about theory for now. – IRTFM Sep 20 '18 at 15:11
  • 1
    You might consider attmepting to generate a "prediction" for a set of covariates calculated at their means. Thats what is called the "baseline hazard" when doing work with results from the survival package. See `?basehaz` and `?survfit` (especially the second citations "Details" section.) – IRTFM Sep 20 '18 at 15:18
  • @42-I appreciate for your comments. You were a big help. – SJUNLEE Sep 24 '18 at 12:38

1 Answers1

1

Actually, the returns is cumulative baseline hazard function(integral part: \int^t\lambda(z)dz), and survival function can be computed as below:

s(t|X)=exp{-e^f(X)\int^t\lambda(z)dz}

f(X) is prediction of gbm, which is equal to log-hazard proportion.

I think this tutorial about gbm-based survival analysis would help to u!

https://github.com/liupei101/Tutorial-Machine-Learning-Based-Survival-Analysis/blob/master/Tutorial_Survival_GBM.ipynb

yuukilp
  • 21
  • 5