0

I am exploring the number of features that would be best to use for my models. I understand that a Repeated Stratified K Fold requires 1 1D array output while I am trying to evaluate the number of features for an output that has multiple outputs. Is there a way to use the Repeated Stratified K Fold with multiple outputs? Or is there an alternative to accomplish what I need?

from sklearn import datasets
from numpy import mean, std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score, RepeatedStratifiedKFold, KFold
from sklearn.feature_selection import RFE
from sklearn.tree import DecisionTreeClassifier
from sklearn.pipeline import Pipeline
from matplotlib import pyplot
def get_models():
   models = dict()
   for i in range(4,20):
      rfe = RFE(estimator = DecisionTreeClassifier(), n_features_to_select = i)
      model = DecisionTreeClassifier()
      models[str(i)] = Pipeline(steps=[('s', rfe), ('m', model)])
   return models
from sklearn.utils.multiclass import type_of_target
x = imp_data.iloc[:,:34]
y = imp_data.iloc[:,39]
model = DecisionTreeClassifier()
def evaluate_model(model,x,y):
   cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=3, random_state=0)
   scores = cross_val_score(model, x, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score = 'raise')
return scores
models = get_models()
results, names = list(), list()
for name,model in models.items():
   scores = evaluate_model(model,x,y)
   results.append(scores)
   names.append(name)
   print('>%s %.3f (%.3f)' % (name, mean(scores), std(scores)))

1 Answers1

0

as i know, you can use cross_validate() as alternative of StratifiedKFold with multiple output. You can define cross validation technique with StratifiedKFold and scoring metrics as your preference. You can check link below for more detail !

https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html