Questions tagged [cross-validation]

Cross-Validation is a method of evaluating and comparing predictive systems in statistics and machine learning.

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model.

In typical cross-validation, the training and validation sets must cross-over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation.

Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation.

2604 questions

votes

1 answer

Caret Package: Stratified Cross Validation in Train Function

Is there a way to perform stratified cross validation when using the train function to fit a model to a large imbalanced data set? I know straight forward k fold cross validation is possible but my categories are highly unbalanced. I've seen…

r-caret cross-validation

asked Mar 10 '16 at 04:15

Windstorm1981

2,564
7
29
57

votes

2 answers

Creating folds for k-fold CV in R using Caret

I'm trying to make a k-fold CV for several classification methods/hiperparameters using the data available at http://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data. This set is made of 208…

r cross-validation r-caret

asked Apr 07 '14 at 03:46

gcolucci

votes

1 answer

Cross-validation and parameters tuning with XGBoost and hyperopt

One way to do nested cross-validation with a XGB model would be: from sklearn.model_selection import GridSearchCV, cross_val_score from xgboost import XGBClassifier # Let's assume that we have some data for a binary classification # problem : X…

python machine-learning scikit-learn cross-validation xgboost

asked Sep 19 '18 at 15:05

Pouteri

votes

1 answer

Compare ways to tune hyperparameters in scikit-learn

This post is about the differences between LogisticRegressionCV, GridSearchCV and cross_val_score. Consider the following setup: import numpy as np from sklearn.datasets import load_digits from sklearn.linear_model import LogisticRegression,…

python machine-learning scikit-learn cross-validation hyperparameters

asked Aug 04 '18 at 23:42

farmer

votes

2 answers

How to implement SMOTE in cross validation and GridSearchCV

I'm relatively new to Python. Can you help me improve my implementation of SMOTE to a proper pipeline? What I want is to apply the over and under sampling on the training set of every k-fold iteration so that the model is trained on a balanced data…

python scikit-learn pipeline cross-validation grid-search

asked Jan 21 '18 at 18:18

MLearner

votes

2 answers

Using cross validation and AUC-ROC for a logistic regression model in sklearn

I'm using the sklearn package to build a logistic regression model and then evaluate it. Specifically, I want to do so using cross validation, but can't figure out the right way to do so with the cross_val_score function. According to the…

python scikit-learn logistic-regression cross-validation roc

asked May 17 '17 at 23:17

NeonBlueHair

1,139
2
9
22

votes

1 answer

scikit-learn: cross_val_predict only works for partitions

I am struggling to work out how to implement TimeSeriesSplit in sklearn. The suggested answer at the link below yields the same ValueError. sklearn TimeSeriesSplit cross_val_predict only works for partitions here the relevant bit from my code: from…

python scikit-learn time-series cross-validation

asked Apr 07 '17 at 10:14

James Edwards

votes

1 answer

sklearn grid search with grouped K fold cv generator

I am trying to implement a grid search over parameters in sklearn using randomized search and a grouped k fold cross-validation generator. The following…

python scikit-learn cross-validation

asked Mar 17 '17 at 14:14

Sam Weisenthal

2,791
9
28
66

votes

1 answer

Caret package - cross-validating GAM with both smooth and linear predictors

I would like to cross validate a GAM model using caret. My GAM model has a binary outcome variable, an isotropic smooth of latitude and longitude coordinate pairs, and then linear predictors. Typical syntax when using mgcv is: gam1 <- gam( y ~ s(lat…

r r-caret cross-validation gam mgcv

asked Jan 15 '17 at 16:37

Paul Lantos

votes

1 answer

How to get classes labels from cross_val_predict used with predict_proba in scikit-learn

I need to train a Random Forest classifier using a 3-fold cross-validation. For each sample, I need to retrieve the prediction probability when it happens to be in the test set. I am using scikit-learn version 0.18.dev0. This new version adds the…

python scikit-learn cross-validation

asked Aug 31 '16 at 18:06

gc5

9,468
24
90
151

votes

2 answers

Spark CrossValidatorModel access other models than the bestModel?

I am using Spark 1.6.1: Currently I am using a CrossValidator to train my ML Pipeline with various parameters. After the training process I can use the bestModel property of the CrossValidatorModel to get the Model that performed best during the…

apache-spark apache-spark-mllib cross-validation apache-spark-1.6

asked Aug 10 '16 at 13:14

MeiSign

1,487
1
15
39

votes

2 answers

StratifiedKFold vs StratifiedShuffleSplit vs StratifiedKFold + Shuffle

What is the difference between: StratifiedKFold, StratifiedShuffleSplit, StratifiedKFold + Shuffle? When should I use each one? When I get a better accuracy score? Why I do not get similar results? I have put my code and the results. I am using…

python-3.x scikit-learn cross-validation

asked Jun 04 '16 at 22:02

Aizzaac

3,146
8
29
61

votes

2 answers

sklearn: User defined cross validation for time series data

I'm trying to solve a machine learning problem. I have a specific dataset with time-series element. For this problem I'm using well-known python library - sklearn. There are a lot of cross validation iterators in this library. Also there are several…

python scikit-learn cross-validation

asked Nov 25 '15 at 23:20

Demyanov

votes

1 answer

Log transform dependent variable for regression tree

I have a dataset where I find that the dependent (target) variable has a skewed distribution - i.e. there are a few very large values and a long tail. When I run the regression tree, one end-node is created for the large-valued observations and one…

machine-learning regression cross-validation

asked Jan 30 '15 at 16:08

airjordan707

votes

3 answers

R: Cross validation on a dataset with factors

Often, I want to run a cross validation on a dataset which contains some factor variables and after running for a while, the cross validation routine fails with the error: factor x has new levels Y. For example, using package boot: library(boot) d…

r data-analysis cross-validation

asked Nov 13 '13 at 06:30

musically_ut

34,028
8
94
106

Prev 1 2 3

…

99 100 Next