I am trying to implement the survival analysis model as documented here: Scala-Docs#Survival-Regression but I cannot make heads or tails of how you are supposed to do the actual implementation.
I am trying to model the "survivability" of a customer for a business. Survivability of a customer is a label given to customers based on if a purchase was made in the last month. If a customer fails to make a purchase, they are considered dead/censured. The two factors I am taking into account are "number of times advertised to" and "amount of time spent on business website". Data is collected about the customer on a monthly basis.
Here is what my data looks like for two customers (CustA and CustB) over three monthly time periods:
val seqCust = Seq(
//Customer,Period,Censor,# of Ads,Amount of Time on Site
("CustA",1,0,4,2400),
("CustA",2,0,6,1800),
("CustA",3,1,2,600),
("CustB",1,0,2,2800),
("CustB",2,0,4,2100),
("CustB",3,0,3,1200)
)
I then want to transform it into something like this as the docs specify:
val dfCust = seqCust.map(cr=>(cr._2,cr._3,Vectors.dense(cr._4,cr._5)).toDF("label", "censor", "features")
So that my data now looks like this:
[1,0,[4,2400]],
[2,0,[6,1800]],
[3,1,[2,600]],
[1,0,[2,2800]],
[2,0,[4,2100]],
[3,0,[3,1200]]
And then do the following:
val quantileProbabilities = Array(0.3, 0.6)
val aft = new AFTSurvivalRegression()
.setQuantileProbabilities(quantileProbabilities)
.setQuantilesCol("quantiles")
val model = aft.fit(dfCust)
// Print the coefficients, intercept and scale parameter for AFT survival regression
println(s"Coefficients: ${model.coefficients}")
println(s"Intercept: ${model.intercept}")
println(s"Scale: ${model.scale}")
model.transform(dfCust).show(false)
But I do not understand:
- Is this the correct way to model the data as per Scala's documentation?
- How come I am not taking the customer ID into account anywhere?