Questions tagged [k-fold]

A technique in cross-validation where the data is partitioned into k subsets (or "folds"), where the first k-1 folds are used for training and the last fold for evaluation. The process is repeated k times, leaving out a different fold for evaluation each time.

284 questions
0
votes
0 answers

Whats the correct way to format X and Y from binnary dataframe to use on Stratified K-Fold cross-validation

My data is an dataframe of a table of 25 columns and 2737 rows containg binnary data. The goal is to train using each row as an INPUT and get as an OUTPUT a probabilistic prediction of what the next sequence could be. Data on this scenario is always…
Wisdom
  • 121
  • 1
  • 1
  • 13
0
votes
0 answers

Getting unnacurate number of rows when using predict function in a cross validation excercise

I'm Performing a K-fold exercise with K = 10 for polinomials from degree 1 to 5 with the purpose of identifying which polynomial fits the best the data provided. Never the less, when I try to predict Y-Hat using the testing data (X-test) which…
Lucpi
  • 1
0
votes
0 answers

I am getting the following error. "You should leave random_state to its default (None), or set shuffle=True."

I'm trying to test several families, using different algorithms to see if any perform well. And I want to compare AUC with Standard Deviation using cross-validation with K-Fold. X = pd.concat([X_train, X_test]) y = pd.concat([y_train, y_test]) from…
0
votes
1 answer

How to do K-fold cross validation without using python libraries?

I am trying to do a cross validation, however, I am only allowed to use those libraries below (as the professor demanded): import numpy as np from sklearn import svm from sklearn.datasets import load_iris Therefore, I am not able to use KFold for…
Shan
  • 3
  • 5
0
votes
1 answer

k-fold cross validation in quanteda

I've been using the quanteda SML workflow as described in the quanteda tutorial (https://tutorials.quanteda.io/machine-learning/nb/) and found it extremely helpful to set up my own classification task. However, instead of the fixed held-out…
0
votes
1 answer

K-Folds cross-validator show KeyError: None of Int64Index

I try to use K-Folds cross-validator with dicision tree. I use for loop to train and test data from KFOLD like this code. df = pd.read_csv(r'C:\\Users\data.csv') # split data into X and y X = df.iloc[:,:200] Y = df.iloc[:,200] X_train, X_test,…
user572575
  • 1,009
  • 3
  • 25
  • 45
0
votes
0 answers

How can I measure the probability of error of a trained model, in particular the random forest?

To do the binary classification of a set of images, I trained the random forest on a set of data. I now want to evaluate the error probability of my model. For that, I did two things and I don't know what corresponds to this error probability: I…
0
votes
1 answer

Building neural network using k-fold cross validation

I am new to deep learning, trying to implement a neural network using 4-fold cross-validation for training, testing, and validating. The topic is to classify the vehicle using an existing dataset. The accuracy result is 0.7. Traning Accuracy An…
zed
  • 11
  • 3
0
votes
0 answers

10-fold cross validation for a logistic regression in google colab python

y3_data is the death variable 0 for alive and 1 for dead, x3_data are my categorical variable the are all have binary output for example Diabetes 0 for yes 1 for no and so on i have around 6 variables in x3_data that have a significant P value with…
0
votes
0 answers

Should the same cross-validation method be used across multiple models?

The assignment is to write a simple ML program that trains and predicts on a dataset of our choice. I want to determine the best model for my data. The response is a class (0/1). I wrote code to try different cross-validation methods (validation…
0
votes
0 answers

Can we apply two time cross validation on same dataset?

First, we split the dataset using stratify parameter train_test_split(np.array(X), y, train_size=TRAIN_SIZE, stratify=y, random_state=42) and then apply KFlod Cross Validation kfold = KFold(n_splits=num_folds, shuffle=True) fold_no = 1 for train,…
0
votes
0 answers

Imbalanced categorical predictors cross validation with continuous target

I am working on a project where I want to measure the predictive performance of some categorical variables on click-through rate (continuous). However, the categorical variables are highly imbalanced: packaged_goods: 796 food: 104 person:…
0
votes
0 answers

How to see the indices of the split on the data that GridSearchCV used when it made the split?

When using GridSearchCV() to perform a k-fold cross validation analysis on some data is there a way to know which data was used for each split? For example, assumed the goal is to build a binary classifier of your choosing, named 'model'. There are…
jensenn
  • 1
  • 1
0
votes
1 answer

How to split the dataset into mutiple folds while keeping the ratio of an attribute fixed

Let's say that I have a dataset with multiple input features and one single output. For the sake of simplicity, let's say the output is binary. Either zero or one. I want to split this dataset into k parts and use a k-fold cross-validation model to…
Mehran
  • 15,593
  • 27
  • 122
  • 221
0
votes
0 answers

Creating a random forest function

I am trying to create a function that takes a 2-d numpy array (i.e. the data) and data_indices (a list of (train_indices,test_indices) tuples) as input.For each (train_indices,test_indices) tuple in data_indices ---the function should: Train a new…