Questions tagged [cross-validation]

Cross-Validation is a method of evaluating and comparing predictive systems in statistics and machine learning.

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model.

In typical cross-validation, the training and validation sets must cross-over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation.

Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation.

2604 questions
1
vote
1 answer

predict classes of test data using k folding using sklearn

I am working on a data mining project and I am using the sklearn package in python for classifying my data. in order to train my data and evaluate the quality of the predicted values, I am using the sklearn.cross_validation.cross_val_predict…
1
vote
1 answer

LDA cross validation and variable selection

I have a data frame with 395 observations and 36 variables. I am doing cross validation to select the best few variables to classify the student qualifications. I have written this code: k<-5 error <- c() for(l in 1:35){ if(l!=31 && l!=32 &&…
1
vote
2 answers

python sklearn cross_validation /number of labels does not match number of samples

Doing a course on machine learning, and I want to split the data into train and test sets. I want to split it up, use Decisiontree on it for training, and then print out the score of my test set. The cross validation parameters in my code were…
hmmmbob
  • 1,167
  • 5
  • 19
  • 33
1
vote
1 answer

How to use cross validation in MATLAB

I'm trying to make a svm classificator using Matlab and want to use cross validation. But predictor = fitcsvm(features, vect, 'Standardize', true, 'CrossVal', 'on'); returns ClassificationPartitionedModel and function predict can not operate with…
Vladimir
  • 179
  • 9
1
vote
1 answer

Parameter selection of SVM

I have a dataset which I use for classifcation with libSVM in Matlab. The dataset consists of 4 classes. For parameter selection of SVM I can do nested cross-validation. The problem is that I also need the value of the best parameters in the…
machinery
  • 5,972
  • 12
  • 67
  • 118
1
vote
0 answers

computed initial MA coefficients are not invertible [Python] [TSA] [ARIMAX] [CrossValidation]

I have endog variable (with 200 observations), exog variable (with 200 observations) I want to train ARIMAX model on 163 observations and predict 181th observation, then train on 164 observations and predict 182nd observation, and so on until train…
Supreeth Meka
  • 1,879
  • 2
  • 15
  • 16
1
vote
1 answer

h2o.runif() always returns the same vector

I am writing the code for cross validation of my models' performance.In order to split data set randomly I use this method: h2o.runif(train.hex) Unfortunately it always returns me the same vector: 0.7309678 0.2405364 0.6374174 0.5504370…
Ivan T
  • 1,046
  • 1
  • 10
  • 22
1
vote
2 answers

does we need significant test when we use 10-fold cross validation?

Usually to show that our results are not by chance we use significant test like t-test. But when we use 10-fold cross validation we learn&test our modals over chunks of dataset. I'm wondering does we need t-test when we have used 10-fold cross…
user3070752
  • 694
  • 4
  • 23
1
vote
1 answer

In R caret, obtain in-sample and out-of sample probability estimates

I have some data similar to: data(Titanic) # need one row per passenger df <- data.frame(Titanic, stringsAsFactors=TRUE) df <- df[rep(seq_len(nrow(df)), df[,"Freq"]), which(names(df)!="Freq")] I trained a model in caret using repeated…
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
1
vote
3 answers

'bad input shape' when using scikit-learn SVM and optunity

I'm trying to use optunity package to tuning my SVM model, I'm directly copy and past it's up-to-date example code , just import the feature array and data array import optunity import optunity.metrics import sklearn.svm import numpy as…
Yank
  • 718
  • 6
  • 17
1
vote
0 answers

Comparing RapidMiner models with x-validation

I am working in some forecasting models with RapidMiner and need some orientation to interpret the outputs and select the best among them. I am following some tutorials to check their accuracy with x-validation, and I am getting results…
1
vote
2 answers

Average values of precision, recall and fscore for each label

I'm cross validating a sklearn classifier model and want to quickly obtain average values of precision, recall and f-score. How can I obtain those values? I don't want to code the cross validation by myself, instead I'm using the function…
1
vote
1 answer

Time Series - Splitting Data Using The timeSlice Method

Referring to this post:createTimeSlices function in CARET package in R where createTimeSlices was suggested as an option for cross-validating when using time series data. I would like to understand how to go about selecting values for…
dts86
  • 401
  • 7
  • 17
1
vote
1 answer

Confusion Matrices in Orange

I'm using cross-validation to evaluate the performance of the classification algorithms in orange, but I have some doubts with respect to the confusion matrices: How can I store the confusion matrix associated to each fold of the…
1
vote
1 answer

Grid searching hyper-parameters of SVM-anova and get the chosen feature in Sklearn

There is an example in doc of sklearn SVM-Anova. I want to further doGridSearchCV for hyper-paremeters, i.d., C and gamma for SVM, for every percentile of features used in the example like this: transform =…
Francis
  • 6,416
  • 5
  • 24
  • 32