1

I'm trying to build a sklearn.Pipeline for survival analysis including two stages:

  1. Class imbalance using imblearn classes.
  2. scikit-survival classes for running survival analysis.

The problem I'm having is an incapability of target features between these two classes, since for imblearn the target is binary and for scikit-survival it is continuous. Since the pipeline object only takes an target vector, I'm unable to combine these two steps. Do you guys know any workaround to build a pipeline using different target vectors for different steps? Thank you in advance.

Example:

from sklearn.pipeline import make_pipeline
from sksurv.linear_model import CoxPHSurvivalAnalysis, CoxnetSurvivalAnalysis
from imblearn.under_sampling import RandomUnderSampler 

# Load data
X_train = data[feats] 
y_train = data[target]

# Construct pipe
steps = [RandomUnderSampler(), CoxPHSurvivalAnalysis()]
cph = make_pipeline(*steps)
cph.fit(X_train, y_train)
tomas-silveira
  • 593
  • 3
  • 5
  • 14
  • 1
    can you give an example of the desired pipeline (step1 -> ... -> stepn)? – warped Apr 21 '22 at 16:43
  • Added an example. The problem is in the target input that RandomUnderSampler() and CoxPHSurvivalAnalysis() expects. The former expects a binary input and the latter a continuous one. – tomas-silveira Apr 21 '22 at 16:52

0 Answers0