Questions tagged [cross-validation]

Cross-Validation is a method of evaluating and comparing predictive systems in statistics and machine learning.

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model.

In typical cross-validation, the training and validation sets must cross-over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation.

Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation.

2604 questions
11
votes
1 answer

Combining Grid search and cross validation in scikit learn

For improving Support Vector Machine outcomes i have to use grid search for searching better parameters and cross validation. I'm not sure how combining them in scikit-learn. Grid search search best parameters…
postgres
  • 2,242
  • 5
  • 34
  • 50
10
votes
1 answer

Differences between RepeatedStratifiedKFold and StratifiedKFold in sklearn

I tried to read the docs for RepeatedStratifiedKFold and StratifiedKFold, but couldn't tell the difference between the two methods except that RepeatedStratifiedKFold repeats StratifiedKFold n times with different randomization in each…
Nemo
  • 1,124
  • 2
  • 16
  • 39
10
votes
4 answers

Value Error X has 24 features, but DecisionTreeClassifier is expecting 19 features as input

I'm trying to reproduce this GitHub project on my machine, on Topological Data Analysis (TDA). My steps: get best parameters from a cross-validation output load my dataset feature selection extract topological features from the dataset for…
8-Bit Borges
  • 9,643
  • 29
  • 101
  • 198
10
votes
1 answer

How to calculate feature importance in each models of cross validation in sklearn

I am using RandomForestClassifier() with 10 fold cross validation as follows. clf=RandomForestClassifier(random_state = 42, class_weight="balanced") k_fold = StratifiedKFold(n_splits=10, shuffle=True, random_state=42) accuracy = cross_val_score(clf,…
EmJ
  • 4,398
  • 9
  • 44
  • 105
10
votes
1 answer

Why is cross_val_predict not appropriate for measuring the generalisation error?

When I train a SVC with cross validation, y_pred = cross_val_predict(svc, X, y, cv=5, method='predict') cross_val_predict returns one class prediction for each element in X, so that y_pred.shape = (1000,) when m=1000. This makes sense, since cv=5…
zwithouta
  • 1,319
  • 1
  • 9
  • 22
10
votes
3 answers

Not able to use Stratified-K-Fold on multi label classifier

The following code is used to do KFold Validation but I am to train the model as it is throwing the error ValueError: Error when checking target: expected dense_14 to have shape (7,) but got array with shape (1,) My target Variable has 7 classes. I…
Sai Pavan
  • 173
  • 1
  • 12
10
votes
1 answer

Getting features in RFECV scikit-learn

Inspired by this: http://scikit-learn.org/stable/auto_examples/feature_selection/plot_rfe_with_cross_validation.html#sphx-glr-auto-examples-feature-selection-plot-rfe-with-cross-validation-py I am wondering if there is anyway to get the features for…
Javiss
  • 765
  • 3
  • 10
  • 24
10
votes
3 answers

Applying k-fold Cross Validation model using caret package

Let me start by saying that I have read many posts on Cross Validation and it seems there is much confusion out there. My understanding of that it is simply this: Perform k-fold Cross Validation i.e. 10 folds to understand the average error across…
pmanDS
  • 193
  • 1
  • 2
  • 10
10
votes
3 answers

GridSearchCV on LogisticRegression in scikit-learn

I am trying to optimize a logistic regression function in scikit-learn by using a cross-validated grid parameter search, but I can't seem to implement it. It says that Logistic Regression does not implement a get_params() but on the documentation…
10
votes
1 answer

Collecting out-of-fold predictions from a caret model

I want to use the out-of-fold predictions from a caret model to train a second-stage model that includes some of the original predictors. I can collect the out-of-fold predictions as follows: #Load…
Zach
  • 29,791
  • 35
  • 142
  • 201
9
votes
1 answer

Custom Scoring Function in sklearn Cross Validate

I would like to use a custom function for cross_validate which uses a specific y_test to compute precision, this is a different y_test than the actual target y_test. I have tried a few approaches with make_scorer but I don't know how to actually…
Tartaglia
  • 949
  • 14
  • 20
9
votes
3 answers

Custom Evaluator in PySpark

I want to optimize the hyper parameters of a PySpark Pipeline using a ranking metric (MAP@k). I have seen in the documentation how to use the metrics defined in the Evaluation (Scala), but I need to define a custom evaluator class because MAP@k is…
Amanda
  • 941
  • 2
  • 12
  • 28
9
votes
2 answers

Error: Classification metrics can't handle a mix of multiclass-multioutput and multilabel-indicator targets

I am newbie to machine learning in general. I am trying to do multilabel text classification. I have the original labels for these documents as well as the result of the classification (used mlknn classifier) represented as one hot encoding (19000…
Lossan
  • 411
  • 1
  • 8
  • 16
9
votes
2 answers

Interpreting sklearns' GridSearchCV best score

I would like to know the difference between the score returned by GridSearchCV and the R2 metric calculated as below. In other cases I receive the grid search score highly negative (same applies for cross_val_score) and I would be grateful for…
abu
  • 737
  • 5
  • 8
  • 19
9
votes
2 answers

TypeError: 'KFold' object is not iterable

I'm following one of the kernels on Kaggle, mainly, I'm following A kernel for Credit Card Fraud Detection. I reached the step where I need to perform KFold in order to find the best parameters for Logistic Regression. The following code is shown in…
kevinH
  • 345
  • 2
  • 4
  • 7