Questions tagged [k-fold]

A technique in cross-validation where the data is partitioned into k subsets (or "folds"), where the first k-1 folds are used for training and the last fold for evaluation. The process is repeated k times, leaving out a different fold for evaluation each time.

284 questions
3
votes
2 answers

sklearn's KFold function with shuffle and random_state

I'm trying to understand how to use the cross-validation function sklearn.model_selection.KFold. If I define (like in this tutorial) from sklearn.model_selection import KFold kf = KFold(n_splits=5, shuffle=False, random_state=100) I…
Medulla Oblongata
  • 3,771
  • 8
  • 36
  • 75
3
votes
1 answer

Why we should call split() function during passing StratifiedKFold() as a parameter of GridSearchCV?

What I am trying to do? I am trying to use StratifiedKFold() in GridSearchCV(). Then, what does confuse me? When we use K Fold Cross Validation, we just pass the number of CV inside GridSearchCV() like the following. grid_search_m =…
3
votes
1 answer

How do I access the datasets after running k-fold with scikit-learn?

I'm trying to apply the kfold method, but I don't know how to access the training and testing sets generated. After going through several blogs and scikitlearn user guide, the only thing people do is to print the training and testing sets. This…
dekio
  • 810
  • 3
  • 16
  • 33
2
votes
1 answer

Why sklearn's KFold can only be enumerated once (also on using it in xgboost.cv)?

Trying to create a KFold object for my xgboost.cv, and I have import pandas as pd from sklearn.model_selection import KFold df = pd.DataFrame([[1,2,3,4,5],[6,7,8,9,10]]) KF = KFold(n_splits=2) kf = KF.split(df) But it seems I can only enumerate…
Yue Y
  • 583
  • 1
  • 6
  • 24
2
votes
0 answers

How to perform a k-fold cross validation in Google Earth Engine?

I am interested in performing a k-fold cross validation and accuracy assessment for a land cover classification in Google Earth Engine. I have compiled the code below which partitions the training data into a training and testing subset for model…
2
votes
1 answer

sklearn.model_selection.cross_val_score has different results from a manual calculation done on a confusion matrix

TL;DR When I calculate precision, recall, and f1 through CV cross_val_score(), it gives me different results than when I calculate through the confusion matrix. Why does it give different precision, recall, and f1 scores? I'm learning SVM in machine…
2
votes
1 answer

How apply kfold cross validation using tf.keras.utils.image_dataset_from_directory

My aim is to apply k-fold cross-validation for training a VGG19 model. In order to do so, I read my images from directory using the following code: DIR = "/Images" data_dir = pathlib.Path(os.getcwd() + '\\Images') train_ds =…
2
votes
1 answer

TypeError: Target data is missing. Your model has `loss`: binary_crossentropy, and therefore expects target data to be passed in `fit()`

I'm trying to run a model and validate it's using Stratified K-fold validation. I have stored the training and testing images together in a new folder and stored the ground truths of both training and testing in a CSV for taking a label. I'm using…
2
votes
2 answers

How is it that the accuracy score for 10-fold cross validation is worst than for a 90-10 train_test_split using sklearn?

The task is binary classification via a neural network. The data is present in a dictionary, that contains composite names (as the key) of each entries and the labels (0 or 1, as the third element in the vector value). The first and second elements…
oliver.c
  • 65
  • 1
  • 8
2
votes
1 answer

ValueError: The number of folds must be of Integral type. [array([[0.25 , 0.

I used an extreme learning machine (ELM) model for predicting as a regression. I used K-fold to validate model prediction. But after executing the following code I get this message error: ValueError: The number of folds must be of Integral type.…
sera
  • 63
  • 5
2
votes
0 answers

KFold cross validation: shuffle =True vs shuffle=False

Should I set shuffle=True in sklearn.model_selection.KFold ? I'm in this situation where I'm trying to evaluate the cross_val_score of my model on a given dataset. if I write cross_val_score(estimator=model, X=X, y=y, cv=KFold(shuffle=False),…
James Arten
  • 523
  • 5
  • 16
2
votes
2 answers

Generic Function for K-Fold Cross-Validation In R for Linear Models

Hi guys I need help truble shooting the fucntion below. I am using R language. The dataset i am using is called wages and it is from a package called library(ISLR) data(wages). Anyhow, I am trying to develop a function that allows me to perform…
2
votes
2 answers

ImageDataGenerator.flow_from_directory to a dataset that can be used in Kfold

I am trying to use the cross validation approach for the model I use for classifying images into 3 classes. I use the following code to import images: train_datagen = ImageDataGenerator(rescale=1./255) data =…
2
votes
1 answer

Nested cross-validation with GroupKFold with sklearn

In my data, several entries correspond to a single subject and I don't won't to mix those entries between the train and the test set. For this reason, I looked at the GroupKFold fold iterator, that according to the sklearn documentation is a "K-fold…
giograno
  • 1,749
  • 3
  • 18
  • 30
2
votes
1 answer

What is the correct way to use standardization/normalization in combination with K-Fold Cross Validation?

I have always learned that standardization or normalization should be fit only on the training set, and then be used to transform the test set. So what I'd do is: scaler = StandardScaler() scaler.fit_transform(X_train) scaler.transform(X_test) Now…
1
2
3
18 19