Questions tagged [cross-validation]

Cross-Validation is a method of evaluating and comparing predictive systems in statistics and machine learning.

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model.

In typical cross-validation, the training and validation sets must cross-over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation.

Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation.

2604 questions
33
votes
6 answers

how to implement walk forward testing in sklearn?

In sklearn, GridSearchCV can take a pipeline as a parameter to find the best estimator through cross validation. However, the usual cross validation is like this: to cross validate a time series data, the training and testing data are often splitted…
PhilChang
  • 2,591
  • 1
  • 16
  • 18
32
votes
2 answers

Does TensorFlow have cross validation implemented?

I was thinking of trying to choose hyper parameters (like regularization for example) using cross validation or maybe train multiple initializations of a models and then choose the model with highest cross validation accuracy. Implementing k-fold or…
32
votes
4 answers

predict_proba for a cross-validated model

I would like to predict the probability from Logistic Regression model with cross-validation. I know you can get the cross-validation scores, but is it possible to return the values from predict_proba instead of the scores? # imports from…
30
votes
3 answers

How to perform k-fold cross validation with tensorflow?

I am following the IRIS example of tensorflow. My case now is I have all data in a single CSV file, not separated, and I want to apply k-fold cross validation on that data. I have data_set =…
mommomonthewind
  • 4,390
  • 11
  • 46
  • 74
30
votes
5 answers

How to use the a k-fold cross validation in scikit with naive bayes classifier and NLTK

I have a small corpus and I want to calculate the accuracy of naive Bayes classifier using 10-fold cross validation, how can do it.
user2284345
  • 501
  • 2
  • 5
  • 9
29
votes
11 answers

How to extract best parameters from a CrossValidatorModel

I want to find the parameters of ParamGridBuilder that make the best model in CrossValidator in Spark 1.4.x, In Pipeline Example in Spark documentation, they add different parameters (numFeatures, regParam) by using ParamGridBuilder in the Pipeline.…
Mohammad
  • 1,006
  • 2
  • 15
  • 29
29
votes
2 answers

Topic models: cross validation with loglikelihood or perplexity

I'm clustering documents using topic modeling. I need to come up with the optimal topic numbers. So, I decided to do ten fold cross validation with topics 10, 20, ...60. I have divided my corpus into ten batches and set aside one batch for a holdout…
user37874
  • 415
  • 1
  • 5
  • 11
27
votes
3 answers

StratifiedKFold vs KFold in scikit-learn

I use this code to test KFold and StratifiedKFold. import numpy as np from sklearn.model_selection import KFold,StratifiedKFold X = np.array([ [1,2,3,4], [11,12,13,14], [21,22,23,24], [31,32,33,34], [41,42,43,44], …
user9270170
26
votes
5 answers

Split tensor into training and test sets

Let's say I've read in a textfile using a TextLineReader. Is there some way to split this into train and test sets in Tensorflow? Something like: def read_my_file_format(filename_queue): reader = tf.TextLineReader() key, record_string =…
Luke
  • 6,699
  • 13
  • 50
  • 88
25
votes
2 answers

Does GridSearchCV perform cross-validation?

I'm currently working on a problem which compares three different machine learning algorithms performance on the same data-set. I divided the data-set into 70/30 training/testing sets and then performed grid search for the best parameters of each…
25
votes
3 answers

Put customized functions in Sklearn pipeline

In my classification scheme, there are several steps including: SMOTE (Synthetic Minority Over-sampling Technique) Fisher criteria for feature selection Standardization (Z-score normalisation) SVC (Support Vector Classifier) The main parameters to…
25
votes
2 answers

How to perform random forest/cross validation in R

I'm unable to find a way of performing cross validation on a regression random forest model that I'm trying to produce. So I have a dataset containing 1664 explanatory variables (different chemical properties), with one response variable (retention…
user2062207
  • 955
  • 4
  • 18
  • 34
24
votes
1 answer

How to use lightgbm.cv for regression?

I want to do a cross validation for LightGBM model with lgb.Dataset and use early_stopping_rounds. The following approach works without a problem with XGBoost's xgboost.cv. I prefer not to use Scikit Learn's approach with GridSearchCV, because it…
Marius
  • 409
  • 1
  • 5
  • 9
24
votes
2 answers

Deprecation warnings from sklearn

I am using cross_validation from sklearn, from sklearn.cross_validation import train_test_split I get the below warning: cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection…
Biranchi
  • 16,120
  • 23
  • 124
  • 161
24
votes
1 answer

cross validation + decision trees in sklearn

Attempting to create a decision tree with cross validation using sklearn and panads. My question is in the code below, the cross validation splits the data, which i then use for both training and testing. I will be attempting to find the best depth…
razeal113
  • 451
  • 1
  • 4
  • 13