2

I am trying to implement the survival analysis model as documented here: Scala-Docs#Survival-Regression but I cannot make heads or tails of how you are supposed to do the actual implementation.

I am trying to model the "survivability" of a customer for a business. Survivability of a customer is a label given to customers based on if a purchase was made in the last month. If a customer fails to make a purchase, they are considered dead/censured. The two factors I am taking into account are "number of times advertised to" and "amount of time spent on business website". Data is collected about the customer on a monthly basis.

Here is what my data looks like for two customers (CustA and CustB) over three monthly time periods:

val seqCust = Seq(
//Customer,Period,Censor,# of Ads,Amount of Time on Site
("CustA",1,0,4,2400),
("CustA",2,0,6,1800),
("CustA",3,1,2,600),
("CustB",1,0,2,2800),
("CustB",2,0,4,2100),
("CustB",3,0,3,1200)
)

I then want to transform it into something like this as the docs specify:

val dfCust = seqCust.map(cr=>(cr._2,cr._3,Vectors.dense(cr._4,cr._5)).toDF("label", "censor", "features")

So that my data now looks like this:

[1,0,[4,2400]],
[2,0,[6,1800]],
[3,1,[2,600]],
[1,0,[2,2800]],
[2,0,[4,2100]],
[3,0,[3,1200]]

And then do the following:

val quantileProbabilities = Array(0.3, 0.6)
val aft = new AFTSurvivalRegression()
  .setQuantileProbabilities(quantileProbabilities)
  .setQuantilesCol("quantiles")

val model = aft.fit(dfCust)

// Print the coefficients, intercept and scale parameter for AFT survival regression
println(s"Coefficients: ${model.coefficients}")
println(s"Intercept: ${model.intercept}")
println(s"Scale: ${model.scale}")
model.transform(dfCust).show(false)

But I do not understand:

  1. Is this the correct way to model the data as per Scala's documentation?
  2. How come I am not taking the customer ID into account anywhere?
EliSquared
  • 1,409
  • 5
  • 20
  • 44
  • I'm not sure about your first question, it isn't clear for me. As per your second question, the *default* label, censor and features columns are respectively "label", "censor" and "features". That's why you didn't need to precise that explicitly. – eliasah Mar 21 '19 at 09:30

0 Answers0