Questions tagged [k-fold]

A technique in cross-validation where the data is partitioned into k subsets (or "folds"), where the first k-1 folds are used for training and the last fold for evaluation. The process is repeated k times, leaving out a different fold for evaluation each time.

284 questions
1
vote
1 answer

In GroupKFold in ScikitLearn error message: ValueError: too many values to unpack (expected 2)

In using the GroupKFold method from scikit-learn I am getting an error message which I can not understand given the documentation. The error message is: ValueError: too many values to unpack (expected 2) The documentation states: For a…
user8270077
  • 4,621
  • 17
  • 75
  • 140
1
vote
1 answer

Why am I getting "Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead." error?

I am writing this code and keep getting the Supported target types are: ('binary', 'multiclass'). Got 'continuous' instead. error no matter what I try. Do you see the problem within my code? df = pd.read_csv('drain.csv') values = df.values seed =…
1
vote
1 answer

StratifiedKFold split train and validation set size

I am using StratifiedKFold and I am not sure what is the training and test size returned by kfold.split in my code below. Assuming Print(array.shape) returns (12904, 47) i.e number of rows are 12904 and number of columns are 47, what would be the…
learner
  • 581
  • 7
  • 27
1
vote
1 answer

Python for loop iteration with multiple variables

I am performing k-fold validation on multiple datasets at the same time. I am using KFold from sklearn to do 10 fold validation. Basically this partitions a dataset into 10 pieces, and trains a classifier on 9 of those pieces then tests the results…
mtrns
  • 73
  • 6
1
vote
1 answer

Creating Kfold cross validation set without sklearn

I am trying to split my data into K-folds with train and test set. I am stuck at the end: I have a data set example: [1,2,3,4,5,6,7,8,9,10] I have successful created the partition for 5-fold cross validation and the output is fold=[[2,…
Shubham Bajaj
  • 309
  • 1
  • 3
  • 12
1
vote
1 answer

K folding using sklearn with specific clusters instead of spliting with specific size

I would like to do a K-fold cross validation with sklearn in python.My data has 8 users and i only do K-fold on the data of one user.Is it possible to do cross validation between the users?For instance to use 7 users as a train set and 1 user as…
Anast Tzin
  • 13
  • 2
1
vote
1 answer

k-fold cross validation in RankLib

I want to do 5 fold cross validation on MQ2008 dataset. I am using RankLib to apply ML algo on the dataset. I am confused about the kcv option given in Ranklib for cross validation. command used: java - jar RankLib.jar -ranker 0 -train train.txt…
Neha
  • 19
  • 2
  • 6
1
vote
1 answer

avoiding data leakage with timed data and cross validation

I'm using the Kobe Bryant Dataset. I wish to predict the shot_made_flag with KnnRegressor. I'm trying to avoid data leakage by grouping the data by season, year, and month. season is pre-existing column and year and month are columns I've added…
Jorayen
  • 1,737
  • 2
  • 21
  • 52
1
vote
0 answers

Apply stratified 10-fold cross validation using random forest

I am a beginner in machine learning. I have the dataset without normalization but I will use StandardScaler in process. I have multiclass (class 1, 2, ..., 10) I would like to know how to apply 10-fold cross-validation instead of…
1
vote
1 answer

Pyspark ML: how to get subModels values with CrossValidator()

I would like to get the cross-validation's (internal) training accuracy, using PySpark end ML library: lr = LogisticRegression() param_grid = (ParamGridBuilder() .addGrid(lr.regParam, [0.01, 0.5]) …
Simone
  • 4,800
  • 12
  • 30
  • 46
0
votes
0 answers

Optimize code using RepeatedStratifiedKFold

I'm running the following code: import numpy as np import pandas as pd from sklearn.dummy import DummyClassifier from sklearn.model_selection import RepeatedStratifiedKFold, train_test_split from sklearn.metrics import roc_curve,…
0
votes
0 answers

Manager said, "K fold validation won’t fix the test-train split issue. It is just for post-validation. Kindly read about correct splitting."

Initial 'Logistic Regression_Iris_Hyperparameter Tuning' that is done in the code below because Logistic regression on Iris Data set was giving me the Accuracy score = 1 which is wrong. import pandas as pd import matplotlib.pyplot as plt import…
0
votes
0 answers

K-fold cross validation in PyTorch, augmenting train and valid data separately

I want to perform k-fold CV, but in my past approach, the augmentations where leaking into the validation dataset. For this, I am using the WrapperDataset class, I found in this post: Augmenting only the training set in K-folds cross validation.…
0
votes
2 answers

Gaussian Naive Bayes gives weird results

This is a basic implementation of Gaussian Bayes using sklearn. Can anyone tell me what I'm doing wrong here, my K-Fold CV results are a bit weird: import numpy as np import pandas as pd from sklearn.naive_bayes import GaussianNB from…
0
votes
2 answers

How to train a model with kfold cv

I want to train an xgboost binary classifier. My training data with labels is in a txt file that has libsvms in it. I am working with an extremely imbalanced dataset, roughly 200 of one class and 66,000 of the other class. Due to that, an advisor…
sshen
  • 1
  • 1