Why are the predict values of gbm (R package) negative?

Question

I analyzed my data with 'gbm' R package. My data is based on a cohort study. Therefore, I ran 'gbm' model based on the 'coxph' results.

After constructing a model, I would like to see how this model can predict well. On the other hand, like the code below, the values of prediction are negative. So, I have a trouble understanding this phenomenon. Please let me know how to interpret this value.

Here's my code.

install.packages("survival")
install.packages("randomForestSRC")
install.packages("gbm")

library(survival)
library(randomForestSRC)
library(gbm)

data(pbc, package="randomForestSRC")
data <- na.omit(pbc)

exposure <- names(data[, names(data.model) !=c("days", "status")])
formula <- as.formula(paste("Surv(days, status)~", paste(exposure, collapse="+")))

set.seed(123)
ex <- gbm(Surv(days, status)~., 
          data=data,
          distribution="coxph",
          cv.folds=5,
          shrinkage=.01,
          n.trees=1000)

set.seed(123)
pred <- predict(ex, n.trees=1000, type="response")

@42- I'm sorry. object "data.model" should've been changed to "data" — SJUNLEE, Sep 07 '18 at 02:37
I'm wondering where you found documentation describing handling of `Surv`-objects in the LHS of formulas given to `gbm`? I don't see any description of what should happen in the help pages. (So I'm wondering if that means your gbm result is a prediction of survival times absent and consideration of censoring?) — IRTFM, Sep 08 '18 at 01:22
https://cran.r-project.org/web/packages/gbm/gbm.pdf in this description, examples on page 25 suggests "Surv" objects with distribution="coxph" — SJUNLEE, Sep 08 '18 at 03:15
Then the linear predictors are on the log scale and the "responses" (if these are like estimates from `survival::predict.coxph` would be calculated as `exp( pred)`, and they would be relative risks of events relative to persons with the "average" of the covariates. The values I get are all positive, but only 3 out of 276 are above 1.0, so I'm a bit suspicious of the validity of this interpretation. — IRTFM, Sep 08 '18 at 04:08
I should add that sometimes there is a behind the scenes scaling done by "machine learning" sorts of functions. I would ahve thought that the needed back-transformations would be part of any "predict" function but you do need to check. (And there does not appear to be a "response" type in the code or help page for `gbm:::predict.gbm`.) — IRTFM, Sep 08 '18 at 04:16

score 2 · Answer 1 · answered Sep 02 '18 at 16:57

2

Read the ?predict.gbm help page, particularly the parameter type. By default predictions are on the link scale.

answered Sep 02 '18 at 16:57

user2554330

37,248
4
43
90

Thank you so much! but, as I'm a newbie on machine learning analysis, please let me know "link scale". – SJUNLEE Sep 07 '18 at 02:44
1

This is off-topic for this site, so I'll be brief: many generalized linear models and related models like `gbm` transform the mean, e.g. by a log transformation. So a mean that is always positive is transformed to a log mean that can be positive or negative. That's the link scale. – user2554330 Sep 07 '18 at 05:07

Why are the predict values of gbm (R package) negative?

1 Answers1