A technique in cross-validation where the data is partitioned into k subsets (or "folds"), where the first k-1 folds are used for training and the last fold for evaluation. The process is repeated k times, leaving out a different fold for evaluation each time.
Questions tagged [k-fold]
284 questions
3
votes
2 answers
sklearn's KFold function with shuffle and random_state
I'm trying to understand how to use the cross-validation function sklearn.model_selection.KFold. If I define (like in this tutorial)
from sklearn.model_selection import KFold
kf = KFold(n_splits=5, shuffle=False, random_state=100)
I…

Medulla Oblongata
- 3,771
- 8
- 36
- 75
3
votes
1 answer
Why we should call split() function during passing StratifiedKFold() as a parameter of GridSearchCV?
What I am trying to do?
I am trying to use StratifiedKFold() in GridSearchCV().
Then, what does confuse me?
When we use K Fold Cross Validation, we just pass the number of CV inside GridSearchCV() like the following.
grid_search_m =…

Md. Sabbir Ahmed
- 850
- 8
- 22
3
votes
1 answer
How do I access the datasets after running k-fold with scikit-learn?
I'm trying to apply the kfold method, but I don't know how to access the training and testing sets generated. After going through several blogs and scikitlearn user guide, the only thing people do is to print the training and testing sets. This…

dekio
- 810
- 3
- 16
- 33
2
votes
1 answer
Why sklearn's KFold can only be enumerated once (also on using it in xgboost.cv)?
Trying to create a KFold object for my xgboost.cv, and I have
import pandas as pd
from sklearn.model_selection import KFold
df = pd.DataFrame([[1,2,3,4,5],[6,7,8,9,10]])
KF = KFold(n_splits=2)
kf = KF.split(df)
But it seems I can only enumerate…

Yue Y
- 583
- 1
- 6
- 24
2
votes
0 answers
How to perform a k-fold cross validation in Google Earth Engine?
I am interested in performing a k-fold cross validation and accuracy assessment for a land cover classification in Google Earth Engine. I have compiled the code below which partitions the training data into a training and testing subset for model…

Shaeden Gokool
- 57
- 1
- 6
2
votes
1 answer
sklearn.model_selection.cross_val_score has different results from a manual calculation done on a confusion matrix
TL;DR When I calculate precision, recall, and f1 through CV cross_val_score(), it gives me different results than when I calculate through the confusion matrix. Why does it give different precision, recall, and f1 scores?
I'm learning SVM in machine…

Seven
- 330
- 2
- 15
2
votes
1 answer
How apply kfold cross validation using tf.keras.utils.image_dataset_from_directory
My aim is to apply k-fold cross-validation for training a VGG19 model. In order to do so, I read my images from directory using the following code:
DIR = "/Images"
data_dir = pathlib.Path(os.getcwd() + '\\Images')
train_ds =…

ali eskandari
- 55
- 6
2
votes
1 answer
TypeError: Target data is missing. Your model has `loss`: binary_crossentropy, and therefore expects target data to be passed in `fit()`
I'm trying to run a model and validate it's using Stratified K-fold validation. I have stored the training and testing images together in a new folder and stored the ground truths of both training and testing in a CSV for taking a label.
I'm using…

arvind okram
- 63
- 1
- 6
2
votes
2 answers
How is it that the accuracy score for 10-fold cross validation is worst than for a 90-10 train_test_split using sklearn?
The task is binary classification via a neural network. The data is present in a dictionary, that contains composite names (as the key) of each entries and the labels (0 or 1, as the third element in the vector value). The first and second elements…

oliver.c
- 65
- 1
- 8
2
votes
1 answer
ValueError: The number of folds must be of Integral type. [array([[0.25 , 0.
I used an extreme learning machine (ELM) model for predicting as a regression. I used K-fold to validate model prediction. But after executing the following code I get this message error:
ValueError: The number of folds must be of Integral type.…

sera
- 63
- 5
2
votes
0 answers
KFold cross validation: shuffle =True vs shuffle=False
Should I set shuffle=True in sklearn.model_selection.KFold ?
I'm in this situation where I'm trying to evaluate the cross_val_score of my model on a given dataset.
if I write
cross_val_score(estimator=model, X=X, y=y, cv=KFold(shuffle=False),…

James Arten
- 523
- 5
- 16
2
votes
2 answers
Generic Function for K-Fold Cross-Validation In R for Linear Models
Hi guys I need help truble shooting the fucntion below. I am using R language.
The dataset i am using is called wages and it is from a package called library(ISLR)
data(wages).
Anyhow, I am trying to develop a function that allows me to perform…

Tareq
- 31
- 4
2
votes
2 answers
ImageDataGenerator.flow_from_directory to a dataset that can be used in Kfold
I am trying to use the cross validation approach for the model I use for classifying images into 3 classes. I use the following code to import images:
train_datagen = ImageDataGenerator(rescale=1./255)
data =…

Jacqueline
- 49
- 10
2
votes
1 answer
Nested cross-validation with GroupKFold with sklearn
In my data, several entries correspond to a single subject and I don't won't to mix those entries between the train and the test set. For this reason, I looked at the GroupKFold fold iterator, that according to the sklearn documentation is a "K-fold…

giograno
- 1,749
- 3
- 18
- 30
2
votes
1 answer
What is the correct way to use standardization/normalization in combination with K-Fold Cross Validation?
I have always learned that standardization or normalization should be fit only on the training set, and then be used to transform the test set. So what I'd do is:
scaler = StandardScaler()
scaler.fit_transform(X_train)
scaler.transform(X_test)
Now…

Sievag
- 33
- 5