Questions tagged [k-fold]

A technique in cross-validation where the data is partitioned into k subsets (or "folds"), where the first k-1 folds are used for training and the last fold for evaluation. The process is repeated k times, leaving out a different fold for evaluation each time.

284 questions
1
vote
0 answers

Stratified KFold Cross Validation (Keras) ValueError: Found array with dim 4. Estimator expected <= 2

I need to cross validate a keras model using stratified kfold (multiclass task that is imbalanced). Is it possible to use x_train/y_train with imagedatagenerator (flow_from_directory) in (folds = list(StratifiedKFold(k, shuffle=True,…
1
vote
1 answer

Is cross validation used for model selection?

So this is starting to confuse me a bit. Having for example the following code that trains a GLM model: glm_sens = train( form = target ~ ., data = ABT, trControl = trainControl(method = "repeatedcv", number = 5, repeats = 10, classProbs =…
Piet Hein
  • 184
  • 2
  • 16
1
vote
1 answer

K-Fold Cross Validation on entire Dataset

I would like to know if my current procedure is correct, or I might be having data leaks. After importing the dataset, I split with 80/20 ratio. X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.20, random_state=0,…
1
vote
1 answer

Probabilities from cross_val_predict using RepeatedStratifiedKFold 5*10

My Goal is to calculate the AUC, Specificity, Sensitivity with 95 % CI from a 5*10 StratifiedKfold CV. I also need the Specificity and Sensitivity for a Threshold of 0.4 to maximize the Sensitivity. So far I was able to implement it for the AUC.…
Mischa
  • 83
  • 1
  • 10
1
vote
0 answers

Naive Bayes NLTK Cross Validation

I have a problem with understanding how KFold Cross-Validation works in the new model selection version. I am using Naive Bayes classifier and I would like to test it using cross-validation. My test and train data are split like this: test_set =…
Simm
  • 89
  • 1
  • 10
1
vote
0 answers

How to use k-fold cross-validation with the 'patternnet' neural network in Matlab?

I'm trying to use k-fold cross-validation with the patternnet neural network. inputs1 is a feature vector and targets1 is label vector from 'iris_dataset'. And xtrain, xtest, ytrain, and ytest are training & testing features and labels respectively…
Ellie
  • 303
  • 2
  • 16
1
vote
0 answers

Error using R caret package (train) with C5.0 decision tree to do K-fold cross validation

NOW SOLVED. The problem was data=OneT.train, which was wrong. This code was copied over from the original. It needs to be data=OneT in the caret train() function. The current OneT.train had missing values in an attribute field, not the target, from…
user13248694
1
vote
0 answers

What to do after Stratified K-fold?

I have used the StratifiedKFold to cross validate my training data set. The model has achieved an accuracy of 75% which I have found acceptable. Should I just continue and implement my model onto the test set: model.fit(X_train, y_train) y_pred =…
1
vote
0 answers

xgb.cv's auc score is not matching with cross_val_score when `colsample_bytree` is other than 1

I am working on highly imbalanced dataset. During hyperparameter tuning, I found that if colssample_bytree is set to value other than 1, then cross_val_score from sklearn package is not matching with the auc score obtained from xgb.cv. xgb.cv…
1
vote
1 answer

LeavePGroupsOut For multidimensional array

I am working on a research problem and due to a small sized dataset with subjects I am trying to implement Leave N Out style analyses. Currently I am doing this ad-hoc and I stumbled upon scikit-learn LeavePGroupsOut function. I read the docs but I…
konsalex
  • 425
  • 5
  • 15
1
vote
0 answers

RandomForestRegressor - K-fold CV cross_val_predict never complete

I'm using RandomForestRegressor to generate new features: The old script takes 20 mins to complete but still completed... **param_grid = { 'n_estimators': [10, 50, 100, 1000], 'max_depth' : [4,5,6,7,8], } def rfr_model(X, Y): --Perform…
Katereena
  • 11
  • 1
1
vote
1 answer

How to Retain The Evaluation Score of kfold using cross_val_score()

I want to understand kfold more clearly and how to choose the best model after it is implemented as a cross-validation method. According to this source: https://machinelearningmastery.com/k-fold-cross-validation/ the steps to carry out kfold…
1
vote
1 answer

How to do kfold cross-validation for multi-input models

The model is as below: inputs_1 = keras.Input(shape=(10081,1)) layer1 = Conv1D(64,14)(inputs_1) layer2 = layers.MaxPool1D(5)(layer1) layer3 = Conv1D(64, 14)(layer2) layer4 = layers.GlobalMaxPooling1D()(layer3) inputs_2 = keras.Input(shape=(85,)) …
nilsinelabore
  • 4,143
  • 17
  • 65
  • 122
1
vote
1 answer

How to apply Kfold with TfidfVectorizer?

I'm having an issue in apply K-fold cross-validation with Tfidf. it gives me this error ValueError: setting an array element with a sequence. I have seen other questions who had the same problem but they were using train_test_split() It's a little…
1
vote
1 answer

Is it possible to get back the list in stratifiedKFold?

I'd like to do something like this : Skf = sklearn.model_selection.StratifiedKFold(n_splits = 5, shuffle = True) ALPHA,BETA = Skf.split(data_X, data_Y) and then : for train_index, test_index in ALPHA,BETA However, it isn't working, why and how…
Marine Galantin
  • 1,634
  • 1
  • 17
  • 28