4

I am working on Telecom Churn problem and here is my dataset.

http://www.sgi.com/tech/mlc/db/churn.data

Names - http://www.sgi.com/tech/mlc/db/churn.names

I'm new to survival analysis.Given the training data,my idea to build a survival model to estimate the survival time along with predicting churn/non churn on test data based on the independent factors.Could anyone help me with the code or pointers on how to go about this problem.

To be precise,say my train data has got

customer call usage details,plan details,tenure of his account etc and whether did he churn or not.

Using general classification models,I can predict churn or not on test data.Now using Survival analysis,I want to predict the tenure of the survival in test data.

Thanks, Maddy

maddy
  • 113
  • 2
  • 8
  • are you using the survival package? did you look at some examples under `?survival::survfit.coxph` – rawr Nov 22 '14 at 17:23
  • 1
    http://stats.stackexchange.com/ is probably a better fit for your question. – NPE Nov 22 '14 at 17:45
  • 1
    If you don't even know what statistical system or general method to use, then posting in SO is not appropriate. StackOverflow is for questioners who know what they are doing and have a focused coding question. – IRTFM Nov 22 '14 at 18:10
  • I was actually trying to learn and coded too...not able to interpret much from that..so wanted some experts help on the process...anyways will keep trying and will learn it soon – maddy Nov 22 '14 at 18:17
  • 1
    Your data doesn't seem to be suitable for survival analysis. There is no "time to event" column in your data. Survival analysis tells you the duration or longevity of the observations. For that you need a time of first observation and the time at death (churn). Here is a link to a blog post about survival analysis for marketing attribution, which is not dissimilar to analysing churn. – Andrie Nov 23 '14 at 07:00
  • As far as my understanding,account length is the tenure in the dataset and churn/no churn is the event.Guess tenure is in weeks.Can I go ahead with this? – maddy Nov 23 '14 at 09:24
  • Why do you have the SAS tag? – Reeza Apr 29 '15 at 20:27

2 Answers2

12

If you're still interested (or for the benefit of those coming later), I've written a few guides specifically for conducting survival analysis on customer churn data using R. They cover a bunch of different analytical techniques, all with sample data and R code.

Basic survival analysis: http://daynebatten.com/2015/02/customer-churn-survival-analysis/

Basic cox regression: http://daynebatten.com/2015/02/customer-churn-cox-regression/

Time-dependent covariates in cox regression: http://daynebatten.com/2015/12/survival-analysis-customer-churn-time-varying-covariates/

Time-dependent coefficients in cox regression: http://daynebatten.com/2016/01/customer-churn-time-dependent-coefficients/

Restricted mean survival time (quantify the impact of churn in dollar terms): http://daynebatten.com/2015/03/customer-churn-restricted-mean-survival-time/

Pseudo-observations (quantify dollar gain/loss associated with the churn effects of variables): http://daynebatten.com/2015/03/customer-churn-pseudo-observations/

Please forgive the goofy images.

John Chrysostom
  • 3,973
  • 1
  • 34
  • 50
4

Here is some code to get you started:

First, read the data

nm <- read.csv("http://www.sgi.com/tech/mlc/db/churn.names", 
               skip=4, colClasses=c("character", "NULL"), header=FALSE, sep=":")[[1]]
dat <- read.csv("http://www.sgi.com/tech/mlc/db/churn.data", header=FALSE, col.names=c(nm, "Churn"))

Use Surv() to set up a survival object for modeling

library(survival)

s <- with(dat, Surv(account.length, as.numeric(Churn)))

Fit a cox proportional hazards model and plot the result

model <- coxph(s ~ total.day.charge + number.customer.service.calls, data=dat[, -4])
summary(model)
plot(survfit(model))

enter image description here

Add a stratum:

model <- coxph(s ~ total.day.charge + strata(number.customer.service.calls <= 3), data=dat[, -4])
summary(model)
plot(survfit(model), col=c("blue", "red"))

enter image description here

Andrie
  • 176,377
  • 47
  • 447
  • 496
  • Thanks Andrie,I have started working on it. Used coxph <- coxph(Surv(time,event)~x1,method="breslow")and found that international_plan and number_customer_service_calls are major factors for rapid churn.Now want to predict for new data.Will update soon. – maddy Nov 23 '14 at 11:44
  • 1
    Here,does the strata mean that we are segmenting no of calls<=3 and >3 into 2 parts?is that the case?If not could someone please explain me the importance of strata? – maddy Nov 24 '14 at 01:34
  • @Andrie : what is interpretation of the the survival plot made above in context to this data. – akhil verma Oct 29 '15 at 11:50
  • @maddy what is interpretation of the the survival plot made above in context to this data. – akhil verma Oct 29 '15 at 11:51