Questions tagged [cross-validation]

Cross-Validation is a method of evaluating and comparing predictive systems in statistics and machine learning.

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model.

In typical cross-validation, the training and validation sets must cross-over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation.

Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation.

2604 questions
0
votes
0 answers

How do I run multiple jobs in a xgboost model, when I combine multiple scripts to call different functions?

I have a library of different functions for example my xgboost model. I am doing a forecast analysis and thus have a different script where I import the functions to forecast. I am doing cross validation on the model for a grid of parameter values,…
0
votes
0 answers

How can you use sklearn cross-validation functions (eg BayesSearchCV) to find the optimal penalty term in pyGAM?

The problem I'm having is that with pyGAM, the class with the fit method (e.g. LinearGAM) does not have an argument which species the penalty term. Instead you need to specify lam as an argument to your individual splines (e.g. s(0, lam=0.1)). This…
tobmo
  • 25
  • 3
0
votes
0 answers

What is the difference between block bootstrapping and group k-fold cross-validation?

I know that block bootstrapping is a technique used to resample time series data as it can preserve time dependencies within the data. In particular, if the block size is one month, then data within each month is not reshuffled but the blocks…
0
votes
0 answers

Imputing missing values in nested GridSearchCV pipeline to avoid data leakage

I am having some issues with sklearn's way to impute values inside of the established CV and Pipeline frameworks. All of this is to avoid global imputation, which will perturb the models performance due to data leakage. Looking around at several…
0
votes
1 answer

NotFittedError (instance is not fitted yet) after invoked cross_validate

This is my minimal reproducible example: import numpy as np from sklearn.naive_bayes import GaussianNB from sklearn.model_selection import cross_validate x = np.array([ [1, 2], [3, 4], [5, 6], [6, 7] ]) y = [1, 0, 0, 1] model =…
tail
  • 355
  • 2
  • 11
0
votes
0 answers

How can I ensure that the nestcv.train function returns same results each time? Set.seed() is not working

# Convert x and xnew data frames into matrix format, and convert response # column into factor x <- data.matrix(x) xnew <- data.matrix(xnew) y_sub <- ifelse(y == 1, "Class1", "Class0") y_sub <- as.factor(y_sub) # SVM with linear kernel - nested…
David
  • 1
  • 2
0
votes
0 answers

tensorflow.python.framework.errors_impl.FailedPreconditionError: keras_tuner/untitled_project/trial_1; Directory not empty

I am trying to use keras_tuner with cross-validation for hyperparameter optimization. My code looks as follows: for i in range(5): train_df = df[df['fold'] != i] valid_df = df[df['fold'] == i] . . …
0
votes
0 answers

10 Fold Cross Validation Training Delay After First Fold

I have been training and testing a model using 10 fold cross validation which I have implemented myself. The first fold iteration went just fine, I would train on the last nine folds ([1:10]) and test on the first fold ([0]). But after this stage I…
0
votes
0 answers

Parallelize loop cross-validation

I'm attempting to parallelize a loop for conducting manual 10-fold cross validation with RF. However, what happens is only that the memory of the pc overloads and I'm forced to quit R. I don't understand what the problem might be, the code seems…
0
votes
0 answers

Problems dividing multilevel data into folds for cross validation using the perry package

I am trying to assess the performance of a generlaized linear mixed model I have built using behavior and performance data from dairy cows. The data has a nested structure (cows within farms) which I have tried to reflect in the model. I would like…
ILo
  • 1
  • 1
0
votes
0 answers

k-fold Cross-validation in segmention by cnn

I wrote a code for the segmentation of iris images and got relatively good results. But I need to do it better. I want to use k-fold cross validation. I wrote a code for the segmentation of iris images and got relatively good results. But I need to…
0
votes
1 answer

Why a roc auc score from a regular cross-validation is very different from a roc auc score after an hyperparameter tuning?

I'm evaluating a XGBoost classifier. I split the dataset into train and validation sets, perform a cross-validation with the model default implementation using the train set and compute the ROC AUC: xgbClassCV = XGBClassifier() kfold =…
0
votes
0 answers

`cross_validate` not returning full pipeline

I have created a pipeline which looks like this - Pipeline(steps=[('preprocessor', ColumnTransformer(transformers=[('numerical_transform', RobustScaler(), …
0
votes
0 answers

How to use Leave One Group Out as Cross Validation for Feature Selection?

I am having 16 csv files and each file contains around 11250 rows with 19 features and one column for labels. I want to implement Leave One Group Out as Cross Validation for feature selection algoithm like Sequential Forward Selection and Mutual…
0
votes
1 answer

Error: "Boolean array expected for the condition, not float64" during StratifiedK-fold

i'm trying to use the stratifid k- fold for cross validation on my dataset but there is the error "Boolean array expected for the condition, not float64" (in the heading code below). Does anyone know the reason? This is the code: import pandas as…
1 2 3
99
100