0

The code is lifted directly from Data Camp's Marketing Analytics in R module, and applied to a new customer data, but I am stuck with what to do with the results after you apply the model to a new data set.

I have the cox ph model with constant variates formula, seen below

fitCPH1 <- cph(Surv(tenure, purchase) ~ gender + 
                 maritalstatus +  age + monthlypurchase,
               data = customer,                
               x = TRUE, 
               y = TRUE, 
               surv = TRUE,                
               tenure.inc = 1)

I've validated the model in between and now want to apply the results to a new data set. (mockcustomerdata2.csv with 3 test rows)

newdata <- read.csv (file = "mockcustomerdata2.csv",
                      header = TRUE,
                      stringsAsFactors = TRUE,
                      row.names =1,
                      sep=",")

and did

survfit(formula = fitCPH1, newdata = newdata)

Running that line, I get a 3 line result that shows n, events, median (which is the median time for each new data point to do an event) and 0.95LCL/UCL.

__________________________________________
 |  n | events | median | 0.95LCL | 0.95UCL|
1|1000| 281    | 332    | 305     | 361    |
2|1000| 281    | 320    | 297     | 350    |
3|1000| 281    | 322    | 298     | 355    | 

What I want to do is to get this summary result for each data point, and merge it with my new dataset so I have the expected value (median), the upper and lower bounds, for each data point to predict when they'll do an event.

Is this possible, and how do I do this?

J F
  • 11
  • 2

1 Answers1

1

solved the problem using the function surv_median() which stores the table of results into a dataframe, which can then be merged with newdata. Hope this can be helpful to someone!


results <- survfit(formula = fitCPH1, newdata = newdata)

medianvalues <- surv_median(results) #Turns results into a dataframe

#The strata column needs to be converted to a row.name, hence the step below
medianvaluesdf <- data.frame(medianvalues[,-1], row.names=medianvalues[,1])

merged <- merge(newdata, medianvalues, by = "row.names")

J F
  • 11
  • 2