Questions tagged [cross-validation]

Cross-Validation is a method of evaluating and comparing predictive systems in statistics and machine learning.

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model.

In typical cross-validation, the training and validation sets must cross-over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation.

Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation.

2604 questions
1
vote
1 answer

leave-one-out regression using lasso in Matlab

I have 300 data samples with around 4000 dimension feature each. Each input has a 5 dim. output which is in the range of -2 to 2. I am trying to fit a lasso model to it. I went through a few posts which talk about cross validation strategies like…
1
vote
3 answers

How to define the maximum k of the kNN classifier?

I am trying to use kNN classifier to perform some supervised learning. In order to find the best number of 'k' of kNN, I used cross validation. For example, the following codes load some Matlab standard data and run the cross validation to plot…
Samo Jerom
  • 2,361
  • 7
  • 32
  • 38
1
vote
1 answer

leave-one-out cross validation and confusion matrix in knn

I have to classify Iris data using k nearest neighbor, (k=1:30) I have divided the data into sample and training involving the Leave-one-out cross validation, so I have the following script: load fisheriris group=[ones(1,50), 2*ones(1,50),…
user19565
  • 155
  • 1
  • 2
  • 9
1
vote
0 answers

cv.glm cutoff value of 0.75 in r

I am doing some analysis regarding a binomial glm model that I have fitted earlier in R. While looking at my data, I figured out that the suitable cutoff point for my binary outcome should be 0.75 instead of 0.5. I am trying to get the cost()…
Error404
  • 6,959
  • 16
  • 45
  • 58
1
vote
0 answers

How to compute precision, recall, and accuracy in 10-fold cross validation with classification in R?

There is a set of data with one label to classify each row. such as: class x1 x2 1 1 3 1 4 5 2 7 0 2 8 11 I try to compute precision, recall, and accuracy of classification with 10-fold cross validation, but I do not know…
cecilia_z
  • 3
  • 4
1
vote
1 answer

100% accuracy from libsvm

I'm training and cross-validating (10-fold) data using libSVM (with linear kernel). The data consist 1800 fMRI intensity voxels represented as a single datapoint. There are around 88 datapoints in the training-set-file for svm-train. the…
1
vote
1 answer

Leave one out cross validation with lm function in R

I have a dataset of 506 rows on which I am performing Leave-one-out Cross Validation, once I get the mean squared errors , I am computing the mean of the mean squared errors I found. This is changing everytime I run it. Is this expected ? If so, Can…
pa1geek
  • 258
  • 7
  • 19
1
vote
2 answers

Do I use the same idf from training set to perform cross validation?

I am trying to build an SVM classifier in SVM Light using the Vector Space Model. I have 1000 documents and a dictionary of terms I will be using to vectorize each document. Of the 1000 documents, 600 will be for my training set, while the remaining…
Justin
  • 742
  • 5
  • 17
  • 34
1
vote
1 answer

How to view singularities in model fitted in caret train in R

I've got a database that is 161 x 151 and I applied the following on my dataset:- > ctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 10, savePred = T) > model <- train(RT..seconds.~., data = cadets, method = "lm", trControl =…
user2062207
  • 955
  • 4
  • 18
  • 34
1
vote
1 answer

What values to look at in cross validated linear regression in DAAG package

I performed the following on a data set that contains 151 variables with 161 observations:- > library(DAAG) > fit <- lm(RT..seconds.~., data=cadets) > cv.lm(df = cadets, fit, m = 10) And got the following results:- fold 1 Observations in test…
user2062207
  • 955
  • 4
  • 18
  • 34
1
vote
1 answer

Performing additional validation in LIBSVM matlab

I am working on MATLAB LIBSVM for a while to do prediction. I have a dataset out of which I use 75% for training, 15% for finding best parameters and remaining for testing. The code is given below. trainX and trainY are the input and output training…
ChanChow
  • 1,346
  • 7
  • 28
  • 57
1
vote
1 answer

Feature selection + cross-validation, but how to make ROC-curves in R

I'm stuck with the next problem. I divide my data into 10 folds. Each time, I use 1 fold as test set and the other 9 as training set (I do this ten times). On each training set, I do feature selection (filter methode with chi.squared) and then I…
Silke
  • 177
  • 1
  • 11
1
vote
2 answers

Cross-validation for model comparison

I have a relative big data: more than 370,000 observations, categorical dependent variable with 250 levels,10 independent variables which including both numeric and categorical variables. I want to perform a 10-folds cross-validation for model…
Archer
  • 63
  • 6
1
vote
1 answer

Cross validation on fitted survival objects?

I can see how cv.glm work with a glm object, but what about fitted survival models? I have a bunch of models (Weibull, Gompertz, lognormal, etc). I want to assess the prediction error using cross validation. Which package/function can do this in R?
jnam27
  • 1,367
  • 2
  • 12
  • 16
1
vote
2 answers

Cross Validation - Weka API

How can I make a classification model by 10-fold cross-validation using Weka API? I ask this, because each cross-validation's run a new classification model is created. Which classification model should I use in my test data?