I'm running a Cox PH model using lifelines
package on Python.
I find it strange that if I run the model on the whole data there is no problem running it, however when I do a cross-validation (using the package's own validation function) a convergence error appears.
Any idea how I can solve this? The documentation suggested using a penalizer but I haven't found a value that lets me run the thing.
Here's my code if you're wondering:
# Gone right
cph = CoxPHFitter()
cph.fit(daten, "length_of_arrears2", event_col='cured2')
# Gone wrong
cph = CoxPHFitter(penalizer=10)
scores = k_fold_cross_validation(cph, daten, 'length_of_arrears2', event_col='cured2', k=5)
This is the error it outputs:
ConvergenceError: Convergence halted due to matrix inversion problems. Suspicion is high collinearity. Please see the following tips in the lifelines documentation: https://lifelines.readthedocs.io/en/latest/Examples.html#problems-with-convergence-in-the-cox-proportional-hazard-modelMatrix is singular.
I checked the correlation table and some variables are quite correlated but it's still a bit weird to me that it works on the full thing but not on the cross val.
Is there a good way to get rid of high correlation without removing a variable completely?
Edit:
I did a few more tests on it. First I removed all variables with more than 0.74 correlation, that did not work on the KFold approach.
Then, I manually split the data in 90/10, it worked, so I kept trying until 70/30, because 60/40 didn't work already. Any idea?