1

I'm running a Cox PH model using lifelines package on Python.

I find it strange that if I run the model on the whole data there is no problem running it, however when I do a cross-validation (using the package's own validation function) a convergence error appears.

Any idea how I can solve this? The documentation suggested using a penalizer but I haven't found a value that lets me run the thing.

Here's my code if you're wondering:

# Gone right
cph = CoxPHFitter()
cph.fit(daten, "length_of_arrears2", event_col='cured2')

# Gone wrong
cph = CoxPHFitter(penalizer=10)
scores = k_fold_cross_validation(cph, daten, 'length_of_arrears2', event_col='cured2', k=5)

This is the error it outputs:

ConvergenceError: Convergence halted due to matrix inversion problems. Suspicion is high collinearity. Please see the following tips in the lifelines documentation: https://lifelines.readthedocs.io/en/latest/Examples.html#problems-with-convergence-in-the-cox-proportional-hazard-modelMatrix is singular.

I checked the correlation table and some variables are quite correlated but it's still a bit weird to me that it works on the full thing but not on the cross val.

Is there a good way to get rid of high correlation without removing a variable completely?

Edit:

I did a few more tests on it. First I removed all variables with more than 0.74 correlation, that did not work on the KFold approach.

Then, I manually split the data in 90/10, it worked, so I kept trying until 70/30, because 60/40 didn't work already. Any idea?

amestrian
  • 546
  • 3
  • 12
  • You probably have a marginal sample size and the subsetting process is allocating categorical variable in a manner that leaves some combination of them collinear in a multivariate sense. – IRTFM Nov 12 '20 at 16:56
  • I thought so, however I have 14000 observations in total.. I wouldn't consider that marginal, right? – amestrian Nov 12 '20 at 16:59
  • I thought 90:10 splits done ten times was a typical CV regimen? A better validation approach might be a bootstrap estimate, at least if I am reading Harrel's classic RMS text correctly. – IRTFM Nov 12 '20 at 22:46

0 Answers0