Questions tagged [cross-validation]

Cross-Validation is a method of evaluating and comparing predictive systems in statistics and machine learning.

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model.

In typical cross-validation, the training and validation sets must cross-over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation.

Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation.

2604 questions

votes

1 answer

Combining Grid search and cross validation in scikit learn

For improving Support Vector Machine outcomes i have to use grid search for searching better parameters and cross validation. I'm not sure how combining them in scikit-learn. Grid search search best parameters…

asked Feb 14 '13 at 01:11

postgres

2,242
5
34
50

votes

1 answer

Differences between RepeatedStratifiedKFold and StratifiedKFold in sklearn

I tried to read the docs for RepeatedStratifiedKFold and StratifiedKFold, but couldn't tell the difference between the two methods except that RepeatedStratifiedKFold repeats StratifiedKFold n times with different randomization in each…

python machine-learning scikit-learn classification cross-validation

asked Feb 19 '22 at 00:22

Nemo

1,124
2
16
39

votes

4 answers

Value Error X has 24 features, but DecisionTreeClassifier is expecting 19 features as input

I'm trying to reproduce this GitHub project on my machine, on Topological Data Analysis (TDA). My steps: get best parameters from a cross-validation output load my dataset feature selection extract topological features from the dataset for…

python cross-validation decision-tree topological-sort

asked Jan 15 '21 at 20:17

8-Bit Borges

9,643
29
101
198

votes

1 answer

How to calculate feature importance in each models of cross validation in sklearn

I am using RandomForestClassifier() with 10 fold cross validation as follows. clf=RandomForestClassifier(random_state = 42, class_weight="balanced") k_fold = StratifiedKFold(n_splits=10, shuffle=True, random_state=42) accuracy = cross_val_score(clf,…

python machine-learning scikit-learn classification cross-validation

asked Apr 02 '19 at 02:26

EmJ

4,398
9
44
105

votes

1 answer

Why is cross_val_predict not appropriate for measuring the generalisation error?

When I train a SVC with cross validation, y_pred = cross_val_predict(svc, X, y, cv=5, method='predict') cross_val_predict returns one class prediction for each element in X, so that y_pred.shape = (1000,) when m=1000. This makes sense, since cv=5…

python scikit-learn svm cross-validation

asked Mar 05 '19 at 18:56

zwithouta

1,319
1
9
22

votes

3 answers

Not able to use Stratified-K-Fold on multi label classifier

The following code is used to do KFold Validation but I am to train the model as it is throwing the error ValueError: Error when checking target: expected dense_14 to have shape (7,) but got array with shape (1,) My target Variable has 7 classes. I…

keras scikit-learn deep-learning cross-validation

asked Feb 26 '19 at 17:19

Sai Pavan

votes

1 answer

Getting features in RFECV scikit-learn

Inspired by this: http://scikit-learn.org/stable/auto_examples/feature_selection/plot_rfe_with_cross_validation.html#sphx-glr-auto-examples-feature-selection-plot-rfe-with-cross-validation-py I am wondering if there is anyway to get the features for…

python scikit-learn cross-validation rfe

asked May 17 '18 at 08:46

Javiss

votes

3 answers

Applying k-fold Cross Validation model using caret package

Let me start by saying that I have read many posts on Cross Validation and it seems there is much confusion out there. My understanding of that it is simply this: Perform k-fold Cross Validation i.e. 10 folds to understand the average error across…

r cross-validation r-caret rpart

asked Nov 02 '15 at 03:40

pmanDS

votes

3 answers

GridSearchCV on LogisticRegression in scikit-learn

I am trying to optimize a logistic regression function in scikit-learn by using a cross-validated grid parameter search, but I can't seem to implement it. It says that Logistic Regression does not implement a get_params() but on the documentation…

python machine-learning scikit-learn cross-validation logistic-regression

asked Sep 26 '13 at 02:26

genekogan

votes

1 answer

Collecting out-of-fold predictions from a caret model

I want to use the out-of-fold predictions from a caret model to train a second-stage model that includes some of the original predictors. I can collect the out-of-fold predictions as follows: #Load…

r cross-validation r-caret

asked Jun 29 '12 at 19:19

Zach

29,791
35
142
201

votes

1 answer

Custom Scoring Function in sklearn Cross Validate

I would like to use a custom function for cross_validate which uses a specific y_test to compute precision, this is a different y_test than the actual target y_test. I have tried a few approaches with make_scorer but I don't know how to actually…

python scikit-learn cross-validation

asked Jan 07 '19 at 01:49

Tartaglia

votes

3 answers

Custom Evaluator in PySpark

I want to optimize the hyper parameters of a PySpark Pipeline using a ranking metric (MAP@k). I have seen in the documentation how to use the metrics defined in the Evaluation (Scala), but I need to define a custom evaluator class because MAP@k is…

apache-spark pyspark cross-validation metrics

asked Jul 18 '18 at 14:15

Amanda

votes

2 answers

Error: Classification metrics can't handle a mix of multiclass-multioutput and multilabel-indicator targets

I am newbie to machine learning in general. I am trying to do multilabel text classification. I have the original labels for these documents as well as the result of the classification (used mlknn classifier) represented as one hot encoding (19000…

python cross-validation multilabel-classification

asked Jun 24 '18 at 16:32

Lossan

votes

2 answers

Interpreting sklearns' GridSearchCV best score

I would like to know the difference between the score returned by GridSearchCV and the R2 metric calculated as below. In other cases I receive the grid search score highly negative (same applies for cross_val_score) and I would be grateful for…

python scikit-learn cross-validation grid-search

asked May 08 '18 at 11:33

abu

votes

2 answers

TypeError: 'KFold' object is not iterable

I'm following one of the kernels on Kaggle, mainly, I'm following A kernel for Credit Card Fraud Detection. I reached the step where I need to perform KFold in order to find the best parameters for Logistic Regression. The following code is shown in…

python machine-learning scikit-learn cross-validation

asked Feb 06 '18 at 10:54

kevinH

Prev 1 2 3

…

99 100 Next