0

This very simple piece of code,

# imports...
from lifelines import CoxPHFitter
import pandas as pd

src_file = "Pred.csv"

df = pd.read_csv(src_file, header=0, delimiter=',')
df = df.drop(columns=['score'])

cph = CoxPHFitter()
cph.fit(df, duration_col='Length', event_col='Status', show_progress=True)

produces an error:

Traceback (most recent call last): File "C:/Users/.../predictor.py", line 11, in cph.fit(df, duration_col='Length', event_col='Status', show_progress=True)

File "C:\Users\...\AppData\Local\conda\conda\envs\hrpred\lib\site-packages\lifelines\fitters\coxph_fitter.py", line 298, in fit self._check_values(df)

File "C:\Users\...\AppData\Local\conda\conda\envs\hrpred\lib\site-packages\lifelines\fitters\coxph_fitter.py", line 323, in _check_values cols = str(list(X.columns[low_var]))

File "C:\Users\...\AppData\Local\conda\conda\envs\hrpred\lib\site-packages\pandas\core\indexes\base.py", line 1754, in _ _ getitem _ _

result = getitem(key)

IndexError: boolean index did not match indexed array along dimension 0; dimension is 88 but corresponding boolean dimension is 76

However, when I print df itself, everything's all right. As you can see, everything is inside the library. And the library's examples work fine.

Mohamed Thasin ah
  • 10,754
  • 11
  • 52
  • 111
slesher
  • 109
  • 2
  • 13
  • 1
    We would require to have the files that you are using here to find out what exactly is going wrong. Could you maybe provide an example that does not use external files? – 1313e Jan 03 '18 at 11:37
  • Well, unfortunately I cannot provide the file cause it contains the proprietary info. But file is processed perfectly with pandas, I can print and slice data. Is the root of the problem in pandas or CoxPHFitter (lifelines)? – slesher Jan 12 '18 at 14:15
  • 1
    If not the data itself, can you provide the following information: columns in dataframe, the dtype of each column, and the number of null and not-null values in each column. e.g. `df.columns`, `df[column1].dtype`, `sum(df[column1].isnull())`, and `sum(df[column1].notnull())` (for each column). It sounds like indexes aren't being aligned which makes me wonder about null/NaN values. – TCAllen07 Apr 26 '18 at 01:53

1 Answers1

0

Without knowing what your data look like - I had the same error, which was resolved when I removed all but the duration, event and coefficient(s) from the pandas df I was using. That is, I had a lot of extra columns in the df that were confusing the cox PH fitter since you don't actually specify which coef you want to include as an argument to cph.fit().

La Pet
  • 1
  • 1