Questions tagged [cross-validation]

Cross-Validation is a method of evaluating and comparing predictive systems in statistics and machine learning.

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model.

In typical cross-validation, the training and validation sets must cross-over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation.

Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation.

2604 questions

votes

5 answers

Cross Validation in Keras

I'm implementing a Multilayer Perceptron in Keras and using scikit-learn to perform cross-validation. For this, I was inspired by the code found in the issue Cross Validation in Keras from sklearn.cross_validation import StratifiedKFold def…

asked Jan 03 '18 at 21:11

Alisson Hayasi

votes

3 answers

Cross-validation in LightGBM

How are we supposed to use the dictionary output from lightgbm.cv to improve our predictions? Here's an example - we train our cv model using the code below: cv_mod = lgb.cv(params, d_train, 500, …

python machine-learning cross-validation lightgbm

asked Sep 27 '17 at 20:07

Nlind

votes

4 answers

Sklearn preprocessing - PolynomialFeatures - How to keep column names/headers of the output array / dataframe

TLDR: How to get headers for the output numpy array from the sklearn.preprocessing.PolynomialFeatures() function? Let's say I have the following code... import pandas as pd import numpy as np from sklearn import preprocessing as pp a =…

python python-2.7 validation scikit-learn cross-validation

asked Apr 19 '16 at 20:00

Afflatus

2,302
5
25
40

votes

1 answer

Using sklearn cross_val_score and kfolds to fit and help predict model

I'm trying to understand using kfolds cross validation from the sklearn python module. I understand the basic flow: instantiate a model e.g. model = LogisticRegression() fitting the model e.g. model.fit(xtrain, ytrain) predicting e.g.…

python machine-learning scikit-learn cross-validation

asked Feb 16 '17 at 02:46

hselbie

1,749
9
24
40

votes

4 answers

understanding python xgboost cv

I would like to use the xgboost cv function to find the best parameters for my training data set. I am confused by the api. How do I find the best parameter? Is this similar to the sklearn grid_search cross-validation function? How can I find which…

python cross-validation xgboost

asked Dec 26 '15 at 06:20

kilojoules

9,768
18
77
149

votes

2 answers

How to cross validate RandomForest model?

I want to evaluate a random forest being trained on some data. Is there any utility in Apache Spark to do the same or do I have to perform cross validation manually?

apache-spark random-forest cross-validation apache-spark-ml apache-spark-mllib

asked Sep 24 '15 at 19:37

ashishsjsu

votes

2 answers

scikit-learn GridSearchCV with multiple repetitions

I'm trying to get the best set of parameters for an SVR model. I'd like to use the GridSearchCV over different values of C. However, from the previous test, I noticed that the split into the Training/Test set highly influences the overall…

python scikit-learn cross-validation grid-search

asked Feb 14 '17 at 14:30

Titus Pullo

3,751
15
45
65

votes

3 answers

What does KFold in python exactly do?

I am looking at this tutorial: https://www.dataquest.io/mission/74/getting-started-with-kaggle I got to part 9, making predictions. In there there is some data in a dataframe called titanic, which is then divided up in folds using: # Generate cross…

python scikit-learn cross-validation kaggle

asked Mar 17 '16 at 14:09

user

2,015
6
22
39

votes

1 answer

(Python - sklearn) How to pass parameters to the customize ModelTransformer class by gridsearchcv

Below is my pipeline and it seems that I can't pass the parameters to my models by using the ModelTransformer class, which I take it from the link (http://zacstewart.com/2014/08/05/pipelines-of-featureunions-of-pipelines.html) The error message…

python-2.7 machine-learning parameter-passing scikit-learn cross-validation

asked Jan 07 '15 at 03:08

nkhuyu

votes

2 answers

Cross-validation for grouped time-series (panel) data

I work with panel data: I observe a number of units (e.g. people) over time; for each unit, I have records for the same fixed time intervals. When splitting the data into train and test sets, we need to make sure that both sets are disjoint and…

python-3.x scikit-learn time-series cross-validation panel-data

asked Aug 22 '18 at 09:27

mloning

votes

2 answers

Cross-validation in sklearn: do I need to call fit() as well as cross_val_score()?

I would like to use k-fold cross validation while learning a model. So far I am doing it like this: # splitting dataset into training and test sets X_train, X_test, y_train, y_test = train_test_split(dataset_1, df1['label'], test_size=0.25,…

python-3.x scikit-learn cross-validation

asked May 14 '18 at 11:36

torayeff

9,296
19
69
103

votes

1 answer

ValueError: n_splits=10 cannot be greater than the number of members in each class

I am trying to run the following code: from sklearn.model_selection import StratifiedKFold X = ["hey", "join now", "hello", "join today", "join us now", "not today", "join this trial", " hey hey", " no", "hola", "bye", "join today", "no","join…

python scikit-learn cross-validation

asked Jan 18 '18 at 03:30

SFC

votes

1 answer

sklearn cross_val_score gives lower accuracy than manual cross validation

I'm working on a text classification problem, which I've set up like so (I've left out the data processing steps for concision, but they'll produce a dataframe called data with columns X and y): import sklearn.model_selection as ms from…

python python-3.x scikit-learn cross-validation

asked Apr 28 '17 at 19:59

Empiromancer

3,778
1
22
53

votes

2 answers

Sklearn custom transformers: difference between using FunctionTransformer and subclassing TransformerMixin

In order to do proper CV it is advisable to use pipelines so that same transformations can be applied to each fold in the CV. I can define custom transformations by using either sklearn.preprocessing.FunctionTrasformer or by subclassing…

python machine-learning scikit-learn cross-validation

asked Jun 21 '18 at 09:24

artemis

votes

2 answers

How to compute precision,recall and f1 score of an imbalanced dataset for K fold cross validation?

I have an imbalanced dataset containing a binary classification problem. I have built Random Forest Classifier and used k-fold cross-validation with 10 folds. kfold = model_selection.KFold(n_splits=10,…

python scikit-learn random-forest cross-validation supervised-learning

asked Oct 06 '17 at 04:29

Jayashree

Prev 1 2

…

99 100 Next