1

I'm trying to run a survival analysis on a large dataset (about 80 rows x 12,000 cols) in python.

Currently I'm using:

from lifelines import CoxPHFitter

cf = CoxPHFitter()
cf.fit(df, duration_col='Time', event_col='Status')

But it is extremely slow. Breaking up the dataframe into chunks of 100 and running cf.fit multiple times is slightly faster, but it's still clocking in at around 80s. This is notably slower than R's coxph, and I'd really prefer not to use rpy2 to run the analysis in R.

I'm a bit at a loss for how to make this faster, so any suggestions would be greatly appreciated.

Mohamed Thasin ah
  • 10,754
  • 11
  • 52
  • 111
zhivaga
  • 11
  • 3
  • Try with larger chunks, for example 1000 – Isma Mar 09 '18 at 09:16
  • While there may be a computational problems, I think `(about 80 rows x 12,000 cols)` suggests a high-order problem in your analysis. Can I ask what kind of data you have, and how you arrived at 12k columns? – Cam.Davidson.Pilon Mar 12 '18 at 02:48

0 Answers0