1

I train my survival model with the following lines:

wft = WeibullAFTFitter()
wft.fit(train, 'duration', event_col='y')

After this I wish to see what the survival probability at the current time (duration column).

The way that I am currently doing this if by using the following for loop:

p_surv = np.zeros(len(test))
for i in range(len(p_surv)):
    row = test.iloc[i:i+1].drop(dep_var, axis=1)
    t = test.iloc[i:i+1, col_num]
    p_surv[i] = wft.predict_survival_function(row, t).values[0][0]

However, this is really slow considering Im using a for loop (200k+ rows). The other alternative to do wft.predict_survival_function(test, test['duration']) would create a 200000x200000 matrix since it checks each row against all provided times.

I just wish to check the survival probability against its own duration. Is there a function in lifelines that does this?

sachinruk
  • 9,571
  • 12
  • 55
  • 86

1 Answers1

1

good question. I think for now, the best way is to reproduce what the predict survival function is doing. That is, do something like this:

def predict_cumulative_hazard_at_single_time(self, X, times, ancillary_X=None):
    lambda_, rho_ = self._prep_inputs_for_prediction_and_return_scores(X, ancillary_X)
    return (times / lambda_) ** rho_

def predict_survival_function_at_single_time(self, X, times, ancillary_X=None):
    return np.exp(-self.predict_cumulative_hazard_at_single_time(X, times=times, ancillary_X=ancillary_X))


wft.predict_survival_function_at_single_time = predict_survival_function_at_single_time.__get__(wft)
wft.predict_cumulative_hazard_at_single_time = predict_cumulative_hazard_at_single_time.__get__(wft)

p_surv2 = wft.predict_survival_function_at_single_time(test, test['duration'])

I think something like that would work. This might be something I add to the API in the future.

sachinruk
  • 9,571
  • 12
  • 55
  • 86
Cam.Davidson.Pilon
  • 1,606
  • 1
  • 17
  • 31