Questions tagged [cross-validation]

Cross-Validation is a method of evaluating and comparing predictive systems in statistics and machine learning.

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model.

In typical cross-validation, the training and validation sets must cross-over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation.

Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation.

2604 questions
0
votes
1 answer

Label a certain x,y data point on a validation curve

valid1 = plot_validation_curve(rand_search.best_estimator_, X_train, y_train, cv=StratifiedKFold(n_splits=5), param_range=np.arange(2,100,2), param_name = 'max_depth', scoring='f1') I…
fan-yang
  • 13
  • 3
0
votes
1 answer

Print classification result with k fold classification with sklearn package

I have a dataset that I spilt by the holdout method using sklearn. The following is the procedure from sklearn.model_selection import train_test_split (X_train, X_test, y_train, y_test)=train_test_split(X,y,test_size=0.3, stratify=y) I am using…
Encipher
  • 1,370
  • 1
  • 14
  • 31
0
votes
1 answer

Get the best model after cross validation

How do I get the best model after a training with k-fold cross-validation without grid search? for example: model = XGBClassifier(**best_params) cv_scores = cross_val_score(model, X_train, Y_train, cv=5, scoring='f1') I am not sure how to get the…
Vicky
  • 33
  • 5
0
votes
0 answers

How to do cross validation on a docplex MILP model?

I've created some mixed integer linear programming models for feature selection in classification based on support vector machines. Now I should do cross validation on these models, but I can't figure out how to use the scikit learn library to apply…
0
votes
0 answers

HDBScan Random Search Finetuning

Context I am trying to finetuning my hdbscan algorithm from the hdbscan python library using sklearn RandomizedSearchCV. However I am facing the following error : scores = scorer(estimator, X_test) ^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError:…
Mayow
  • 1
  • 1
0
votes
0 answers

Tuning Arguments for CV/ Regression Trees

When I enter: tune_spec<- decision_tree(min_n= tune(), tree_depth= tune(), cost_complexity=tune()) %>% set_engine("rpart") %>% set_mode("regression") tree_grid<- tune_spec %>% extract_parameter_set_dials() %>% …
0
votes
0 answers

Surprise NMF object is not callable

I am building a recommender system using the Sushi Preference Dataset and the NMF (Non-negative Matrix Factorization) model. I am implementing the same using the Surprise library. I want to use Randomized Search CV for hyperparameter tuning.…
0
votes
0 answers

how to do nested cross validation on folds in R using glmnet package

I am trying to generate model using glmnet package in R. I want to do these steps: Randomly split the data into 5 folds. For each fold: a. Remove the fold from the data. b. Use the remaining data to train an elastic-net model using 10-fold…
rheabedi1
  • 65
  • 7
0
votes
0 answers

sklearn cross_val_score always returning same non zero values

I tried using a logistic regression model to predict some data and the first time I use cross_val_score it seems fine. But when I tried to drop some of the less important features and rerun cross_val_score on the limited data it gives the same…
Chris
  • 155
  • 6
0
votes
1 answer

Why is the mean roc score from GridSearchCV using only 1 cv split, different from roc calculated with grid_search.score method or roc_auc_score func?

I was experimenting with sklearn's GridSearchCV, and I don't understand why the mean roc scores I get when using a single split defined with an iterable, are different than what I get running the score method after fitting, or the roc_auc_score…
0
votes
2 answers

Understanding the Substantial Performance Discrepancy between Stratified K-Fold Cross Validation and No Cross Validation in my Prediction

: I have developed two versions of my code where one incorporates stratified k-fold cross validation, while the other lacks any form of cross validation. To my surprise, the results achieved using stratified k-fold cross validation significantly…
0
votes
0 answers

Performing backward variable selection in R based on test data prediction

How can I apply backwards variable selection based on performance on test data in R? I already know that there is the stepAIC() function which does almost what i want, but in every step it removes one variable based on the AIC criteria. i want to do…
Joshua_ABC
  • 13
  • 3
0
votes
0 answers

Cross validation score/Training score/Test score : what should i considered to say whether a model is a well generalised model?

I am new to Machine learning domain and I want to clear my doubt. My model is a multi class classification model based on smiles notation dataset. And my dataset is less than 1000 rows and also it is an imbalance dataset. Suppose i am getting high…
0
votes
0 answers

How to set optimal number of trees

I'm working with the Boston Housing data set, making models using trees. It's possible to calculate the optimal number of trees using cross-validation, as the last line shows (in this case 8 trees): library(tree) library(MASS) tree.test.RMSE <- 0 df…
Russ Conte
  • 124
  • 6
0
votes
0 answers

How to handle hyperparameter tuning for LSTM with early stopping?

I am looking for advice on the best practice to determine hyperparameters for my LSTM model. I have time series data that I have divided into train and test sets. I was planning to use an expanding walk forward cross validation scheme on my train…
Merry
  • 215
  • 2
  • 7