3

I am using an accelerated failure time / AFT model with a weibull distribution to predict data. I am doing this using the survival package in R. I am splitting my data in training and test, do training on the training set and afterwards try to predict the values for the test set. To do that I am passing the the test set as the newdata parameter, as stated in the references. I get an error, saying that newdata does not have the same size as the training data (obviously!). Then the function seems to evaluate predict the values for the training set.

How can I predict the values for the new data?

# get data
library(KMsurv)
library(survival)
data("kidtran") 
n = nrow(kidtran)
kidtran <- kidtran[sample(n),] # shuffle row-wise
kidtran.train = kidtran[1:(n * 0.8),]
kidtran.test = kidtran[(n * 0.8):n,]

# create model 
aftmodel <- survreg(kidtransurv~kidtran.train$gender+kidtran.train$race+kidtran.train$age, dist = "weibull")
predicted <- predict(aftmodel, newdata = kidtran.test)

Edit: As mentioned by Hack-R, there was this line of code missing

kidtransurv <- Surv(kidtran.train$time, kidtran.train$delta)
B--rian
  • 5,578
  • 10
  • 38
  • 89
User12547645
  • 6,955
  • 3
  • 38
  • 69
  • I added your missing `library` statements that jumped out at me. We're still missing `kidtransurv` though. – Hack-R Jul 02 '18 at 16:45
  • Thanks for the update. It looks like I guessed correctly which columns you wanted in Y. You need to type it the way I did in my answer though (keeping the definition in the `survreg` function). Using `data = ` is not required, however it cleans up your code a lot and reduces the typing. – Hack-R Jul 02 '18 at 17:06

1 Answers1

2

The problem seems to be in your specification of the dependent variable.

The data and code definition of the dependent was missing from your question, so I can't see what the specific mistake was, but it did not appear to be a proper Surv() survival object (see ?survreg).

This variation on your code fixes that, makes some minor formatting improvements, and runs fine:

require(survival)
pacman::p_load(KMsurv)

library(KMsurv)
library(survival)
data("kidtran") 

n = nrow(kidtran)

kidtran       <- kidtran[sample(n),] 
kidtran.train <- kidtran[1:(n * 0.8),]
kidtran.test  <- kidtran[(n * 0.8):n,]

# Whatever kidtransurv was supposed to be is missing from your question,
#   so I will replace it with something not-missing
#   and I will make it into a proper survival object with Surv()

aftmodel  <- survreg(Surv(time, delta) ~ gender + race + age, dist = "weibull", data = kidtran.train)
predicted <- predict(aftmodel, newdata = kidtran.test)


head(predicted)
       302        636        727        121         85        612 
 33190.413  79238.898 111401.546  16792.180   4601.363  17698.895
Hack-R
  • 22,422
  • 14
  • 75
  • 131
  • Thank you very much, it is working now. I assumed that defining `kidtransurv <- Surv(kidtran.train$time, kidtran.train$delta)` prior to the definition of the model would be equivalent to your version – User12547645 Jul 02 '18 at 17:06
  • 1
    @User12547645 You're welcome. I might've thought that too at first. – Hack-R Jul 02 '18 at 17:07