Questions tagged [cross-validation]

Cross-Validation is a method of evaluating and comparing predictive systems in statistics and machine learning.

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model.

In typical cross-validation, the training and validation sets must cross-over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation.

Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation.

2604 questions
0
votes
0 answers

Creating a rolling window forecast in r

I need your help understanding rolling window and expanding window forecasting strategy in r . I am using inflation data from Thailand from between January 2003 and December 2014. My problem is as follows: A) I wish to conduct an out of sample…
0
votes
0 answers

Retrain model after CrossValidation

So, as can be seen here, here and here, we should retrain our model using the whole dataset after we are satisfied with our CV results. Check the following code to train a Random Forest: from sklearn.ensemble import RandomForestClassifier from…
Murilo
  • 533
  • 3
  • 15
0
votes
0 answers

Evaluate multiple metrics in a single GridSearchCV with scikit-survival

Currently, I am doing a simulation to compare multiple models, my study doesn't require the best_estimator_ only the results from cv_results_. The problem that I have is that I need the integrated_brier_score and cumulative_dynamic_auc for each…
0
votes
0 answers

Sklearn Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead

I have a multiclass problem (12 classes) and i am using LabelBinarizer to "one-hot-encode" my output. I was trying to use the LabelBinarizer first, then split the data using StratifiedKfold, but got that error: Supported target types are: ('binary',…
Murilo
  • 533
  • 3
  • 15
0
votes
1 answer

Cross-validation logistic regression returns very different accuracies

I'm running cross validation on logistic regression, and I've run into a strange issue where the train and test accuracy are all 100% except for the very first and second fold, which are about 66% accuracy. 100% accuracy is definitely wrong and I am…
0
votes
0 answers

How to do Cross Validation on Neural Networks with multiple binary classification outputs?

I am trying to use StratifiedKFold to do cross validation on my CNN that outputs multiple binary classifications. However, StratifiedKFold are unable to process multi label indicators. skf = StratifiedKFold(n_splits=10, shuffle=True,…
0
votes
0 answers

How do I calculate R-squared from LightGBM cross-validated models in R?

I'm trying to run LightGBM with 5-fold cross-validation to predict the first 123 PCs of a plasma metabolite principal component analysis. I'd like to get the R-squared for the best iteration for each outcome, but can't find a direct way to extract…
0
votes
0 answers

How can I draw a dynamic graph in Python?

I just started using Python. I would like to plot a dynamic graph that shows me the performance (in terms of accuracy) of a kNN algorithm obtained with n-fold cross-validation. I would like to get a graph where: x = k nearest neighbours y = average…
0
votes
0 answers

CatBoostError: Bad value for num_feature

I have code: from sklearn.model_selection import KFold, cross_val_predict from catboost import Pool, CatBoostRegressor, cv from sklearn.metrics import mean_squared_error import matplotlib.pyplot as plt import pandas as pd import numpy as np import…
0
votes
0 answers

DataBricks PySpark error when attempting to fit CrossValidator object has been asked before

I am facing the exact same issue as enter link description here the problem is when calling cross-validation in databricks am getting a weird error just like the one mentioned in the link. Can someone please help.
0
votes
0 answers

How can I train a caret model with time slices while holding out one or more groups for each cross-validation fold?

I'm trying to train a model on a panel of different units over time. I understand how to use createTimeSlices from the caret package, but I'd like to use this same process while simultaneously holding out different units in different training folds.…
broodoots
  • 7
  • 1
  • 2
0
votes
0 answers

Using SMOTE with imblearn pipline and crossvalidation

this more of a theoretical questions, but i am dealing with a pretty imbalanced dataset. Therefore I want to use SMOTE to rebalance the data in order to achieve better results with my models. Now I read that to avoid Data Leakage only the training…
wihee
  • 55
  • 7
0
votes
0 answers

Nested cross validation to XGBoost and Random Forest models

The inner fold and outer fold don't seem to be correct. I am not sure if I am using the training and testing datasets properly. Any help is welcome :) ... # Scale the data scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # Set the outer…
cabral279
  • 97
  • 10
0
votes
1 answer

How to suppress warning messages when lightgbm is used?

I am using lightgbm to train LGBM models in R. However, whenever I call lgb.cv() function, lots of warning messages came out. My code is written as: train_params <- list(objective = "binary", learning_rate = 0.2, num_leaves = 50L, …
Phoebe
  • 53
  • 5
0
votes
1 answer

Marginal R2 for linear mixed models in cross-validation

For a prediction problem I am working on I want to calculate the variance in the data explained by the linear effects of my linear mixed model. To evaluate my predictive performance I plan on using five-fold cross-validation. The common approach to…