A technique in cross-validation where the data is partitioned into k subsets (or "folds"), where the first k-1 folds are used for training and the last fold for evaluation. The process is repeated k times, leaving out a different fold for evaluation each time.
Questions tagged [k-fold]
284 questions
0
votes
0 answers
Whats the correct way to format X and Y from binnary dataframe to use on Stratified K-Fold cross-validation
My data is an dataframe of a table of 25 columns and 2737 rows containg binnary data.
The goal is to train using each row as an INPUT and get as an OUTPUT a probabilistic prediction of what the next sequence could be.
Data on this scenario is always…

Wisdom
- 121
- 1
- 1
- 13
0
votes
0 answers
Getting unnacurate number of rows when using predict function in a cross validation excercise
I'm Performing a K-fold exercise with K = 10 for polinomials from degree 1 to 5 with the purpose of identifying which polynomial fits the best the data provided. Never the less, when I try to predict Y-Hat using the testing data (X-test) which…

Lucpi
- 1
0
votes
0 answers
I am getting the following error. "You should leave random_state to its default (None), or set shuffle=True."
I'm trying to test several families, using different algorithms to see if any perform well. And I want to compare AUC with Standard Deviation using cross-validation with K-Fold.
X = pd.concat([X_train, X_test])
y = pd.concat([y_train, y_test])
from…
0
votes
1 answer
How to do K-fold cross validation without using python libraries?
I am trying to do a cross validation, however, I am only allowed to use those libraries below (as the professor demanded):
import numpy as np
from sklearn import svm
from sklearn.datasets import load_iris
Therefore, I am not able to use KFold for…

Shan
- 3
- 5
0
votes
1 answer
k-fold cross validation in quanteda
I've been using the quanteda SML workflow as described in the quanteda tutorial (https://tutorials.quanteda.io/machine-learning/nb/) and found it extremely helpful to set up my own classification task. However, instead of the fixed held-out…

Max Overbeck
- 5
- 3
0
votes
1 answer
K-Folds cross-validator show KeyError: None of Int64Index
I try to use K-Folds cross-validator with dicision tree. I use for loop to train and test data from KFOLD like this code.
df = pd.read_csv(r'C:\\Users\data.csv')
# split data into X and y
X = df.iloc[:,:200]
Y = df.iloc[:,200]
X_train, X_test,…

user572575
- 1,009
- 3
- 25
- 45
0
votes
0 answers
How can I measure the probability of error of a trained model, in particular the random forest?
To do the binary classification of a set of images, I trained the random forest on a set of data.
I now want to evaluate the error probability of my model.
For that, I did two things and I don't know what corresponds to this error probability:
I…

Sab
- 1
0
votes
1 answer
Building neural network using k-fold cross validation
I am new to deep learning, trying to implement a neural network using 4-fold cross-validation for training, testing, and validating. The topic is to classify the vehicle using an existing dataset.
The accuracy result is 0.7.
Traning Accuracy
An…

zed
- 11
- 3
0
votes
0 answers
10-fold cross validation for a logistic regression in google colab python
y3_data is the death variable 0 for alive and 1 for dead, x3_data are my categorical variable the are all have binary output for example Diabetes 0 for yes 1 for no and so on i have around 6 variables in x3_data that have a significant P value with…

kjnk
- 19
- 3
0
votes
0 answers
Should the same cross-validation method be used across multiple models?
The assignment is to write a simple ML program that trains and predicts on a dataset of our choice. I want to determine the best model for my data. The response is a class (0/1). I wrote code to try different cross-validation methods (validation…

Oliver
- 1,465
- 4
- 17
0
votes
0 answers
Can we apply two time cross validation on same dataset?
First, we split the dataset using stratify parameter
train_test_split(np.array(X), y, train_size=TRAIN_SIZE, stratify=y, random_state=42)
and then apply KFlod Cross Validation
kfold = KFold(n_splits=num_folds, shuffle=True)
fold_no = 1
for train,…

Saghir Ahmed
- 7
- 3
0
votes
0 answers
Imbalanced categorical predictors cross validation with continuous target
I am working on a project where I want to measure the predictive performance of some categorical variables on click-through rate (continuous). However, the categorical variables are highly imbalanced:
packaged_goods: 796
food: 104
person:…

donhendriko
- 1
- 1
0
votes
0 answers
How to see the indices of the split on the data that GridSearchCV used when it made the split?
When using GridSearchCV() to perform a k-fold cross validation analysis on some data is there a way to know which data was used for each split?
For example, assumed the goal is to build a binary classifier of your choosing, named 'model'. There are…

jensenn
- 1
- 1
0
votes
1 answer
How to split the dataset into mutiple folds while keeping the ratio of an attribute fixed
Let's say that I have a dataset with multiple input features and one single output. For the sake of simplicity, let's say the output is binary. Either zero or one.
I want to split this dataset into k parts and use a k-fold cross-validation model to…

Mehran
- 15,593
- 27
- 122
- 221
0
votes
0 answers
Creating a random forest function
I am trying to create a function that takes a 2-d numpy array (i.e. the data) and data_indices (a list of (train_indices,test_indices) tuples) as input.For each (train_indices,test_indices) tuple in data_indices ---the function should:
Train a new…

Dushu
- 31
- 4