0

I analyzed my data with 'gbm' R package. My data is based on a cohort study. Therefore, I ran 'gbm' model based on the 'coxph' results.

After constructing a model, I would like to see how this model can predict well. On the other hand, like the code below, the values of prediction are negative. So, I have a trouble understanding this phenomenon. Please let me know how to interpret this value.

Here's my code.

install.packages("survival")
install.packages("randomForestSRC")
install.packages("gbm")

library(survival)
library(randomForestSRC)
library(gbm)

data(pbc, package="randomForestSRC")
data <- na.omit(pbc)

exposure <- names(data[, names(data.model) !=c("days", "status")])
formula <- as.formula(paste("Surv(days, status)~", paste(exposure, collapse="+")))

set.seed(123)
ex <- gbm(Surv(days, status)~., 
          data=data,
          distribution="coxph",
          cv.folds=5,
          shrinkage=.01,
          n.trees=1000)

set.seed(123)
pred <- predict(ex, n.trees=1000, type="response")
IRTFM
  • 258,963
  • 21
  • 364
  • 487
SJUNLEE
  • 167
  • 2
  • 14
  • `Error: object 'data.model' not found` – IRTFM Sep 02 '18 at 15:58
  • @42- I'm sorry. object "data.model" should've been changed to "data" – SJUNLEE Sep 07 '18 at 02:37
  • I'm wondering where you found documentation describing handling of `Surv`-objects in the LHS of formulas given to `gbm`? I don't see any description of what should happen in the help pages. (So I'm wondering if that means your gbm result is a prediction of survival times absent and consideration of censoring?) – IRTFM Sep 08 '18 at 01:22
  • https://cran.r-project.org/web/packages/gbm/gbm.pdf in this description, examples on page 25 suggests "Surv" objects with distribution="coxph" – SJUNLEE Sep 08 '18 at 03:15
  • Then the linear predictors are on the log scale and the "responses" (if these are like estimates from `survival::predict.coxph` would be calculated as `exp( pred)`, and they would be relative risks of events relative to persons with the "average" of the covariates. The values I get are all positive, but only 3 out of 276 are above 1.0, so I'm a bit suspicious of the validity of this interpretation. – IRTFM Sep 08 '18 at 04:08
  • I should add that sometimes there is a behind the scenes scaling done by "machine learning" sorts of functions. I would ahve thought that the needed back-transformations would be part of any "predict" function but you do need to check. (And there does not appear to be a "response" type in the code or help page for `gbm:::predict.gbm`.) – IRTFM Sep 08 '18 at 04:16

1 Answers1

2

Read the ?predict.gbm help page, particularly the parameter type. By default predictions are on the link scale.

user2554330
  • 37,248
  • 4
  • 43
  • 90
  • Thank you so much! but, as I'm a newbie on machine learning analysis, please let me know "link scale". – SJUNLEE Sep 07 '18 at 02:44
  • 1
    This is off-topic for this site, so I'll be brief: many generalized linear models and related models like `gbm` transform the mean, e.g. by a log transformation. So a mean that is always positive is transformed to a log mean that can be positive or negative. That's the link scale. – user2554330 Sep 07 '18 at 05:07