Questions tagged [cross-validation]

Cross-Validation is a method of evaluating and comparing predictive systems in statistics and machine learning.

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model.

In typical cross-validation, the training and validation sets must cross-over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation.

Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation.

2604 questions

votes

3 answers

Grid Search and Early Stopping Using Cross Validation with XGBoost in SciKit-Learn

I am fairly new to sci-kit learn and have been trying to hyper-paramater tune XGBoost. My aim is to use early stopping and grid search to tune the model parameters and use early stopping to control the number of trees and avoid overfitting. As I am…

asked May 09 '17 at 09:37

George

votes

2 answers

How to customize sklearn cross validation iterator by indices?

Similar to Custom cross validation split sklearn I want to define my own splits for GridSearchCV for which I need to customize the built in cross-validation iterator. I want to pass my own set of train-test indices for cross validation to the…

python validation scikit-learn cross-validation

asked Nov 24 '14 at 03:17

tangy

3,056
2
25
42

votes

1 answer

How to apply oversampling when doing Leave-One-Group-Out cross validation?

I am working on an imbalanced data for classification and I tried to use Synthetic Minority Over-sampling Technique (SMOTE) previously to oversampling the training data. However, this time I think I also need to use a Leave One Group Out (LOGO)…

python machine-learning scikit-learn cross-validation imblearn

asked Jul 10 '19 at 06:27

npm

votes

1 answer

How to standardize data with sklearn's cross_val_score()

Let's say I want to use a LinearSVC to perform k-fold-cross-validation on a dataset. How would I perform standardization on the data? The best practice I have read is to build your standardization model on your training data then apply this model to…

python scikit-learn svm cross-validation standardized

asked Jun 08 '17 at 22:23

als5ev

votes

2 answers

Cross validation with grid search returns worse results than default

I'm using scikitlearn in Python to run some basic machine learning models. Using the built in GridSearchCV() function, I determined the "best" parameters for different techniques, yet many of these perform worse than the defaults. I include the…

python machine-learning scikit-learn cross-validation grid-search

asked Apr 20 '17 at 22:11

Nicholas Hassan

votes

3 answers

What is the difference between cross_val_score with scoring='roc_auc' and roc_auc_score?

I am confused about the difference between the cross_val_score scoring metric 'roc_auc' and the roc_auc_score that I can just import and call directly. The documentation…

python machine-learning scikit-learn random-forest cross-validation

asked Nov 11 '15 at 00:19

MichaelHood

votes

2 answers

Difference between glmnet() and cv.glmnet() in R?

I'm working on a project that would show the potential influence a group of events have on an outcome. I'm using the glmnet() package, specifically using the Poisson feature. Here's my code: # de <- data imported from sql connection x <-…

r classification glm cross-validation glmnet

asked Mar 27 '15 at 22:38

Sean Branchaw

votes

2 answers

I have much more than three elements in every class, but I get this error: "class cannot be less than k=3 in scikit-learn"

This is my target (y): target = [7,1,2,2,3,5,4, 1,3,1,4,4,6,6, 7,5,7,8,8,8,5, 3,3,6,2,7,7,1, 10,3,7,10,4,10, 2,2,2,7] I do not know why while I'm executing: ... # Split the data set in two equal parts X_train, X_test,…

runtime-error svm scikit-learn cross-validation

asked Feb 18 '13 at 02:14

postgres

2,242
5
34
50

votes

2 answers

Trying to Understand FB Prophet Cross Validation

python machine-learning time-series cross-validation facebook-prophet

asked Sep 07 '20 at 15:35

marceloasr

votes

0 answers

Nested cross-validation example on Scikit-learn

I'm trying to work my head around the example of Nested vs. Non-Nested CV in Sklearn. I checked multiple answers but I am still confused on the example. To my knowledge, a nested CV aims to use a different subset of data to select the best…

python scikit-learn nested cross-validation grid-search

asked Oct 06 '17 at 10:18

NCL

votes

2 answers

Sklearn: Cross validation for grouped data

I am trying to implement a cross validation scheme on grouped data. I was hoping to use the GroupKFold method, but I keep getting an error. what am I doing wrong? The code (slightly different from the one I used--I had different data so I had a…

python scikit-learn cross-validation

asked Nov 01 '16 at 23:06

sw007sw

votes

2 answers

Saving a cross-validation trained model in Scikit

I have trained a model in scikit-learn using Cross-Validation and Naive Bayes classifier. How can I persist this model to later run against new instances? Here is simply what I have, I can get the CV scores but I don't know how to have access to the…

python scikit-learn pickle cross-validation

asked Sep 21 '15 at 17:02

Ali

1,605
1
13
19

votes

2 answers

sklearn Kfold acces single fold instead of for loop

After using cross_validation.KFold(n, n_folds=folds) I would like to access the indexes for training and testing of single fold, instead of going through all the folds. So let's take the example code: from sklearn import cross_validation X =…

python scikit-learn cross-validation

asked Dec 09 '14 at 13:54

NumesSanguis

5,832
6
41
76

votes

1 answer

CARET. Relationship between data splitting and trainControl

I have carefully read the CARET documentation at: http://caret.r-forge.r-project.org/training.html, the vignettes, and everything is quite clear (the examples on the website help a lot!), but I am still a confused about the relationship between two…

r machine-learning cross-validation

asked Feb 19 '13 at 22:33

Amelio Vazquez-Reina

91,494
132
359
564

votes

1 answer

Why when I use GridSearchCV with roc_auc scoring, the score is different for grid_search.score(X,y) and roc_auc_score(y, y_predict)?

I am using stratified 10-fold cross validation to find model that predicts y (binary outcome) from X (X has 34 labels) with the highest auc. I set the GridSearchCV: log_reg = LogisticRegression() parameter_grid = {'penalty' : ["l1", "l2"],'C':…

python scikit-learn cross-validation auc

asked Mar 02 '18 at 01:48

huda95x

Prev 1 2 3

…

99 100 Next