Large Dataframe slow with Lifelines Survival Analysis

Asked Mar 09 '18 at 06:55

Active Nov 23 '18 at 10:41

Viewed 311 times

I'm trying to run a survival analysis on a large dataset (about 80 rows x 12,000 cols) in python.

Currently I'm using:

from lifelines import CoxPHFitter

cf = CoxPHFitter()
cf.fit(df, duration_col='Time', event_col='Status')

But it is extremely slow. Breaking up the dataframe into chunks of 100 and running cf.fit multiple times is slightly faster, but it's still clocking in at around 80s. This is notably slower than R's coxph, and I'd really prefer not to use rpy2 to run the analysis in R.

I'm a bit at a loss for how to make this faster, so any suggestions would be greatly appreciated.

edited Nov 23 '18 at 10:41

Mohamed Thasin ah

10,754
11
52
111

asked Mar 09 '18 at 06:55

zhivaga

Try with larger chunks, for example 1000 – Isma Mar 09 '18 at 09:16
While there may be a computational problems, I think `(about 80 rows x 12,000 cols)` suggests a high-order problem in your analysis. Can I ask what kind of data you have, and how you arrived at 12k columns? – Cam.Davidson.Pilon Mar 12 '18 at 02:48

Large Dataframe slow with Lifelines Survival Analysis

0 Answers0