A technique in cross-validation where the data is partitioned into k subsets (or "folds"), where the first k-1 folds are used for training and the last fold for evaluation. The process is repeated k times, leaving out a different fold for evaluation each time.
Questions tagged [k-fold]
284 questions
1
vote
1 answer
In GroupKFold in ScikitLearn error message: ValueError: too many values to unpack (expected 2)
In using the GroupKFold method from scikit-learn I am getting an error message which I can not understand given the documentation.
The error message is:
ValueError: too many values to unpack (expected 2)
The documentation states:
For a…

user8270077
- 4,621
- 17
- 75
- 140
1
vote
1 answer
Why am I getting "Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead." error?
I am writing this code and keep getting the Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead. error no matter what I try. Do you see the problem within my code?
df = pd.read_csv('drain.csv')
values = df.values
seed =…

Kalina Scarbrough
- 15
- 1
- 10
1
vote
1 answer
StratifiedKFold split train and validation set size
I am using StratifiedKFold and I am not sure what is the training and test size returned by kfold.split in my code below. Assuming Print(array.shape) returns (12904, 47) i.e number of rows are 12904 and number of columns are 47, what would be the…

learner
- 581
- 7
- 27
1
vote
1 answer
Python for loop iteration with multiple variables
I am performing k-fold validation on multiple datasets at the same time. I am using KFold from sklearn to do 10 fold validation. Basically this partitions a dataset into 10 pieces, and trains a classifier on 9 of those pieces then tests the results…

mtrns
- 73
- 6
1
vote
1 answer
Creating Kfold cross validation set without sklearn
I am trying to split my data into K-folds with train and test set. I am stuck at the end:
I have a data set example:
[1,2,3,4,5,6,7,8,9,10]
I have successful created the partition for 5-fold cross validation and the output is
fold=[[2,…

Shubham Bajaj
- 309
- 1
- 3
- 12
1
vote
1 answer
K folding using sklearn with specific clusters instead of spliting with specific size
I would like to do a K-fold cross validation with sklearn in python.My data has 8 users and i only do K-fold on the data of one user.Is it possible to do cross validation between the users?For instance to use 7 users as a train set and 1 user as…

Anast Tzin
- 13
- 2
1
vote
1 answer
k-fold cross validation in RankLib
I want to do 5 fold cross validation on MQ2008 dataset. I am using RankLib to apply ML algo on the dataset. I am confused about the kcv option given in Ranklib for cross validation.
command used:
java - jar RankLib.jar -ranker 0 -train train.txt…

Neha
- 19
- 2
- 6
1
vote
1 answer
avoiding data leakage with timed data and cross validation
I'm using the Kobe Bryant Dataset.
I wish to predict the shot_made_flag with KnnRegressor.
I'm trying to avoid data leakage by grouping the data by season, year, and month.
season is pre-existing column and year and month are columns I've added…

Jorayen
- 1,737
- 2
- 21
- 52
1
vote
0 answers
Apply stratified 10-fold cross validation using random forest
I am a beginner in machine learning. I have the dataset without normalization but I will use StandardScaler in process. I have multiclass (class 1, 2, ..., 10)
I would like to know how to apply 10-fold cross-validation instead of…

ppatpk
- 43
- 1
- 7
1
vote
1 answer
Pyspark ML: how to get subModels values with CrossValidator()
I would like to get the cross-validation's (internal) training accuracy, using PySpark end ML library:
lr = LogisticRegression()
param_grid = (ParamGridBuilder()
.addGrid(lr.regParam, [0.01, 0.5])
…

Simone
- 4,800
- 12
- 30
- 46
0
votes
0 answers
Optimize code using RepeatedStratifiedKFold
I'm running the following code:
import numpy as np
import pandas as pd
from sklearn.dummy import DummyClassifier
from sklearn.model_selection import RepeatedStratifiedKFold, train_test_split
from sklearn.metrics import roc_curve,…

Melanie
- 13
- 5
0
votes
0 answers
Manager said, "K fold validation won’t fix the test-train split issue. It is just for post-validation. Kindly read about correct splitting."
Initial 'Logistic Regression_Iris_Hyperparameter Tuning' that is done in the code below because Logistic regression on Iris Data set was giving me the Accuracy score = 1 which is wrong.
import pandas as pd
import matplotlib.pyplot as plt
import…

RISHAV BHARDWAJ
- 1
- 2
0
votes
0 answers
K-fold cross validation in PyTorch, augmenting train and valid data separately
I want to perform k-fold CV, but in my past approach, the augmentations where leaking into the validation dataset.
For this, I am using the WrapperDataset class, I found in this post: Augmenting only the training set in K-folds cross validation.…

b_k_
- 1
- 1
0
votes
2 answers
Gaussian Naive Bayes gives weird results
This is a basic implementation of Gaussian Bayes using sklearn. Can anyone tell me what I'm doing wrong here, my K-Fold CV results are a bit weird:
import numpy as np
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from…

Questions123
- 25
- 5
0
votes
2 answers
How to train a model with kfold cv
I want to train an xgboost binary classifier. My training data with labels is in a txt file that has libsvms in it. I am working with an extremely imbalanced dataset, roughly 200 of one class and 66,000 of the other class. Due to that, an advisor…

sshen
- 1
- 1