The code is lifted directly from Data Camp's Marketing Analytics in R module, and applied to a new customer data, but I am stuck with what to do with the results after you apply the model to a new data set.
I have the cox ph model with constant variates formula, seen below
fitCPH1 <- cph(Surv(tenure, purchase) ~ gender +
maritalstatus + age + monthlypurchase,
data = customer,
x = TRUE,
y = TRUE,
surv = TRUE,
tenure.inc = 1)
I've validated the model in between and now want to apply the results to a new data set. (mockcustomerdata2.csv with 3 test rows)
newdata <- read.csv (file = "mockcustomerdata2.csv",
header = TRUE,
stringsAsFactors = TRUE,
row.names =1,
sep=",")
and did
survfit(formula = fitCPH1, newdata = newdata)
Running that line, I get a 3 line result that shows n, events, median (which is the median time for each new data point to do an event) and 0.95LCL/UCL.
__________________________________________
| n | events | median | 0.95LCL | 0.95UCL|
1|1000| 281 | 332 | 305 | 361 |
2|1000| 281 | 320 | 297 | 350 |
3|1000| 281 | 322 | 298 | 355 |
What I want to do is to get this summary result for each data point, and merge it with my new dataset so I have the expected value (median), the upper and lower bounds, for each data point to predict when they'll do an event.
Is this possible, and how do I do this?