Questions tagged [cross-validation]

Cross-Validation is a method of evaluating and comparing predictive systems in statistics and machine learning.

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model.

In typical cross-validation, the training and validation sets must cross-over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation.

Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation.

2604 questions
1
vote
1 answer

Score of RFECV() in python scikit-learn

Scikit-learn library supports recursive feature elimination(RFE) and its cross validation version(RFECV). RFECV is very useful for me it selects small features, but I wonder how cross validation of RFE is done. RFE is way to reduce least important…
z991
  • 713
  • 1
  • 9
  • 21
1
vote
1 answer

Cross Validation sklearn - How the splits are performed?

I'm currently dealing with a classification problem and have a question about the cross validation functionality of the sklearn / scikit-learn Python module. Consider the following call: cv_scores = cross_validation.cross_val_score(rfc, X, y,…
Patrick Weiß
  • 436
  • 9
  • 23
1
vote
2 answers

Holdout vs. K fold cross validation in libsvm

I am doing a classification task using libsvm. I have a 10 fold cross validation where the F1 score is 0.80. However, when I split the training dataset into two (one is for training and the other is for testing, which I call it holdout test set) the…
user2161903
  • 577
  • 1
  • 6
  • 22
1
vote
1 answer

cross validation clarification

I am having some trouble in understanding how to implement cross validation. In my case I am trying to apply it to an LVQ system. This is what I understood so far... One of the parameters that can be adjusted for LVQ is the number of prototypes to…
ganninu93
  • 1,551
  • 14
  • 24
1
vote
0 answers

Matlab cross-validation on images with multiple class SVM

I am trying to perform cross-validation on images for my SVM, where I have 3 categories of labels for the classification, "Good", "Ok" and "Bad". For my data set, I have a 120 * 20 cell array, mainly 19 columns of features and with the last column…
Piiinkyy
  • 367
  • 1
  • 3
  • 14
1
vote
1 answer

In the Orange data mining toolkit, how do I specify groups for cross-validation?

I'm using the Orange GUI, and trying to perform cross-validation. My data has 8 different groups (specified by a variable in the input data), and I'd like each fold to hold out a different group. Is this possible to do using Orange? I can select the…
LeahNH
  • 535
  • 1
  • 4
  • 12
1
vote
1 answer

Cross Validation in Classification

I have two different datasets, datset X and dataset Y... From which I calculate features to use for classification.. Case 1. When I combine both together as one large datset then use 10 fold cross validation I get very good classification results…
1
vote
1 answer

Leave one out cross validation by leaving out two ID during the training process

I have a dataframe df df<-structure(list(ID = c(4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 8, 8, 8, 9, 9), Y = c(2268.14043972082, 2147.62290922552, 2269.1387550775, 2247.31983098201, 1903.39138268307, 2174.78291538358, 2359.51909126411,…
SimonB
  • 670
  • 1
  • 10
  • 25
1
vote
1 answer

Performing K-fold Cross-Validation: Using Same Training Set vs. Separate Validation Set

I am using the Python scikit-learn framework to build a decision tree. I am currently splitting my training data into two separate sets, one for training and the other for validation (implemented via K-fold cross-validation). To cross-validate my…
user5070125
1
vote
1 answer

Grouping rows from an R dataframe together when randomly assigning to training/testing datasets

I have a dataframe that consists of blocks of X rows, each corresponding to a single individual (where X can be different for each individual). I'd like to randomly distribute these individuals into train, test and validation samples but so far I…
anthr
  • 1,026
  • 4
  • 17
  • 34
1
vote
1 answer

Where is the score function in scikit-learn classifiers located?

When running cross-validation within scikit-learn, all classifiers would have a factory function score() which I can easily check the accuracy of the classifier, e.g. from http://scikit-learn.org/stable/modules/cross_validation.html >>> import numpy…
alvas
  • 115,346
  • 109
  • 446
  • 738
1
vote
1 answer

Leave one out ID cross validation for a random uniform forest in R

I am using a dataframe df df<-structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L), .Label = c("AU-Tum", "AU-Wac", "BE-Bra", "BE-Jal", "BR-Cax", "BR-Sa3", "CA-Ca1", "CA-Ca2", "CA-Ca3",…
SimonB
  • 670
  • 1
  • 10
  • 25
1
vote
0 answers

Finds the optimal cross-validated L1-penalty for a given L2-penalty

I have a list of vectors (each vector is a variable) which values can be 0 or 1, and this values represent the coefficient (a1, a2, ...) of my models: y = x1 * a1 + x2 * a2 ... I need to use cross-validation to build a Poisson regression model that…
Nick
  • 10,309
  • 21
  • 97
  • 201
1
vote
2 answers

How to evaluate the performance of different model on one dataset?

I want to evaluate the performance different model such as SVM, RandForest, CNN etc, I only have one dataset. So I split the dataset to training set and testing set and train different model on this dataset with training data and test with testing…
tidy
  • 4,747
  • 9
  • 49
  • 89
1
vote
1 answer

How to efficiently do cross-validation with big.matrix in R?

I have a function, as follows, that takes a design matrix X with class type big.matrix as input and predicts the responses. NOTE: the size of matrix X is over 10 GB. So I cannot load it into memory. I used read.big.matrix() to generate backing files…
SixSigma
  • 2,808
  • 2
  • 18
  • 21