Questions tagged [cross-validation]

Cross-Validation is a method of evaluating and comparing predictive systems in statistics and machine learning.

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model.

In typical cross-validation, the training and validation sets must cross-over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation.

Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation.

2604 questions
0
votes
0 answers

K-fold cross validation in PyTorch, augmenting train and valid data separately

I want to perform k-fold CV, but in my past approach, the augmentations where leaking into the validation dataset. For this, I am using the WrapperDataset class, I found in this post: Augmenting only the training set in K-folds cross validation.…
0
votes
0 answers

What to do after cross-validation

I've read that once I've tuned my hyperparameters using k-fold cross-validation (on the training set), I should train my model on the entire training set and then evaluate my model on the test set. However, doesn't this again introduce the problem…
0
votes
0 answers

In the Python Prophet package, how can I pass the initial, period and horizon arguments as defined string variables instead of hard-coding the strings

The function cross_validation(m, initial='365 days', period='180 days', horizon='365 days') works, but when I parameterize the arguments like this: initial_days = '1456 days' period_days = '28 days' horizon_days = '84 days' cross_validation(m,…
0
votes
0 answers

Cross-validation in multi-output neural network

I have created a neural network with two branches, one dedicated to regression and the other one to classification. The inputs consist of 104 columns, 52 of which are numeric positive values, and the other 52 consist of binary values (0 or 1)…
0
votes
0 answers

How can I implement a Leave One Patient Out Cross Validation in Python?

I have a dataset of roughly 1600 samples. The whole dataset if made from 22 patients in total. Some patients contribute 250 samples, other patients just 10. This is a balanced dataset in total. I have around 800 samples for each class but the…
0
votes
0 answers

XGBoost segmentation fault during xgb.cv regardless of data size

I am trying to hyperparameter tune an XGBoost model using the bayesian-optimization library, and I continually get a segmentation fault during xgboost cross validation, regardless of how large or small my training data is. I have a dataset with 118…
0
votes
0 answers

Fitting a Pipeline (Imputer and Classifier) with StratifiedGroupKfold cross-validation

I've been struggling for a while with this so I thought I would ask here. I have a dataset with some missing values so I wanted to use KNNImputer to fill them in. To test the validity of the features, for a range of k-values I used a RandomForest…
Adorable
  • 119
  • 1
  • 1
  • 6
0
votes
0 answers

Hyperparameters tuning and cross-validation of sklearn models

I have confusion related to the following things: Data splitting into training, validation, and testing How and at which step should the hyperparameter tuning be performed, and which data should be used for this purpose? Can a stratified k-fold…
0
votes
0 answers

to shown output 10 fold confusion matrix

helo. i have this kind of problem where i couldn't get the output for each fold's confusion matrix, how do i do with the code? I expect to get all fold confusion matrix result, I've tried the code that i put, but the result is different when i try…
0
votes
0 answers

Error for k-fold cross-validation and PCR in R with simulated data

For my thesis I am seeing whether 5-fold cross-validation can be used to find the optimal number of principal component in PCR for time series data. I am using a 3 factor model. However, when I try to run the PCR code I get an error as the data…
Mieska
  • 1
  • 1
0
votes
0 answers

How to apply standardization for train set inside gridsearchcv?

It is a question more about theory than a problem in code itself. I have the following Pipeline, which will then be used in a GridSearchCV: my_model = Pipeline([('scaler', MinMaxScaler()), ('model', model())]) cv = GridSearchCV(my_model ,…
0
votes
0 answers

Multiple Time Series Cross Validation - overlapping train test sets

I have a dataset containing sales data for different products over time. The dataset includes a "Time" column representing the date and a "Product" column specifying the product ID. As multiple products can be sold on the same date, the "Time"…
devcloud
  • 391
  • 5
  • 18
0
votes
0 answers

Time series split with multiple products - python

I have a time series like: index date id value 0 d1 a 10 1 d2 a 15 2 d2 b 20 3 d3 a 18 4 d3 b 19 5 d4 b 21 6 d4 c 25 I want to do a time series split, so the splits would be: train_1 = index[0],…
0
votes
0 answers

For imbalanced binary classification what should be the value of "scoring" In cross_val_score()

For imbalanced binary classification what should be the value of "scoring" in cross_val_score() Is scoring is same as accuracy in cross_val_score() . For binary classification can we used f1_weighted or f1_macro. By dataset is imbalanced binary…
0
votes
1 answer

Why cross_val_score using LeaveOneOut() leads to nan validationscore?

I was trying to fit different cross_val_score type(k-fold(),LeaveOneOut(),LeavepOut()) in iris dataset of sklearn.But LeaveOneOut() leads to nan score list.why is this happening?Can anyone explain?Let me attach my part of code…