I am using the pysurvival
library to model with the Cox proportional hazard model (CPH). Instead of getting the survival curves, I am interested in getting point predictions. In the library, the function predict_survival
returns an array-like representing the prediction of the survival function which I assume that I can use to get the expected values - but I just cant find the right way.
Below I've attached a dummy example.
# Initializing the simulation model
sim = SimulationModel( survival_distribution = 'log-logistic',
risk_type = 'linear',
censored_parameter = 10.1,
alpha = 0.1, beta=3.2 )
# Generating N random samples
N = 200
dataset = sim.generate_data(num_samples = N, num_features = 4)
# Defining the features
features = sim.features
# Creating the X, T and E input
X, T, E = dataset[features], dataset['time'].values, dataset['event'].values
# Building the model
coxph = CoxPHModel()
coxph.fit(X,T,E, lr=0.5, l2_reg=1e-2, init_method='zeros')
When applying the function:
coxph.predict_survival(x=X)
It returns an array of arrays with the shape (200, 87)
- why does it give exactly 87 values for every observation?
As far as I understand I should be able to get the expected value by taking the integral below the curve of the survival curve.
To do this I need to calculate the area under the curve which I think can be done using trapz
in the numpy
library, but I need to know how the spacing between the points are done.