1

I keep having error messages anytime I try running CoxPH regression in Python. I'm not a pro in python still learning.

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
from lifelines import KaplanMeierFitter
from lifelines.statistics import multivariate_logrank_test   
from lifelines.statistics import logrank_test
from lifelines import CoxPHFitter
import pyreadstat

After loading the data

data["faculty2"] = data["faculty2"].astype(int)
data["sex"] = data["sex"].astype(int)
data["mos"] = data["mos"].astype(int)
data["state2"] = data["state2"].astype(int)
data["ss"] = data["ss"].astype(int)
data["supervisor"] = data["supervisor"].astype(int)
data["time"] = data["time"].astype(int)
data["event"] = data["event"].astype(int)

Eventvar = data['event']
Timevar = data['time']

""" assigning labels to values"""
data['sex'] = data['sex'].apply({1:'Male', 0:'female'}.get)
data['faculty2'] = data['faculty2'].apply({1:'Arts',2:'Sciences',3:'Medicals',\
                                            4:'Agriculture', 5:'Social Sciences',6:'Education',\
                                                7:'Tech',8:'Law',9:'Institues'}.get)
data['state2'] = data['state2'].apply({1:'SW',2:'SS',3:'SE',4:'NC', 5:'NE',6:'NW'}.get)
data['ss'] = data['ss'].apply({1:'Yes', 0:'No'}.get)
data['mos'] = data['mos'].apply({1:'Full Time', 0:'Part Time'}.get)

cf = CoxPHFitter()
cf.fit(data, 'time', event_col='event',show_progress=True)
cf.print_summary()

I get this error message when i run these codes

ValueError: could not convert string to float: 'Arts'

Please I need help I don't know how to go about this If I add dummies i have a different error message

ohe_features = ['faculty2', 'sex', 'mos','state2','ss'] 
data = pd.get_dummies(data,drop_first=True,columns=ohe_features)

This is the error message I get

ConvergenceError: Convergence halted due to matrix inversion problems. Suspicion is high collinearity. Please see the following tips in the lifelines documentation: https://lifelines.readthedocs.io/en/latest/Examples.html#problems-with-convergence-in-the-cox-proportional-hazard-modelMatrix is singular

If i run the codes without assigning values to labels and without adding dummies it runs but the different levels are not showing. It runs as though it were continuous variables

Here is the data

Soloibom
  • 11
  • 2
  • 6
  • If I add dummies i have a different error message `ohe_features = ['faculty2', 'sex', 'mos','state2','ss'] data = pd.get_dummies(data,drop_first=True,columns=ohe_features)` – Soloibom Dec 13 '20 at 17:11
  • This is the error message I get `ConvergenceError: Convergence halted due to matrix inversion problems. Suspicion is high collinearity. Please see the following tips in the lifelines documentation: https://lifelines.readthedocs.io/en/latest/Examples.html#problems-with-convergence-in-the-cox-proportional-hazard-modelMatrix is singular.` – Soloibom Dec 13 '20 at 17:20
  • Please edit your question to add additional information, it's easier to read there than in the comments. – paisanco Dec 13 '20 at 17:22
  • If i run the codes without **assigning values to labels** and without adding *dummies* it runs but the different levels are not showing. It runs as though it were continuous variables – Soloibom Dec 13 '20 at 17:35
  • If you are getting warnings about high collinearity, it sounds like your independent variables are too correlated with each other for the algorithm to converge to a meaningful solution. That's more a numerical problem than a programming problem. The ValueError looked like a syntax problem in your data frame however. Without knowing your data this one could be tough to diagnose in the SO format. – paisanco Dec 13 '20 at 17:55
  • I can attach the data if needed – Soloibom Dec 13 '20 at 18:07
  • I don't have the time to troubleshoot it, this is why I am just making suggestions via comment. – paisanco Dec 13 '20 at 18:10
  • @Soloibom without the data it's hard to debug. Can you attach / link to it? – Cam.Davidson.Pilon Dec 14 '20 at 02:13

2 Answers2

1

In the lifelines documentation they suggest

  1. Add the penalize parameter
  2. Use the variance inflation factor or
  3. check the correlation matrix in your dataset

https://lifelines.readthedocs.io/en/latest/Examples.html#problems-with-convergence-in-the-cox-proportional-hazard-modelMatrix

0

I had the pretty identical problem. I changed

cph = CoxPHFitter()

to

cph = CoxPHFitter(penalizer=0.0001)

This solved the issue.

S Ghosh
  • 1
  • 2