Questions tagged [cross-validation]

Cross-Validation is a method of evaluating and comparing predictive systems in statistics and machine learning.

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model.

In typical cross-validation, the training and validation sets must cross-over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation.

Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation.

2604 questions

vote

1 answer

Stratified KFold on sparse(csr) feature matrix

I have a large sparse matrix (95000, 12000) containing the features of my model. I want to do a stratified K fold cross validation using Sklearn.cross_validation module in python. However, I haven't found a way of indexing a sparse matrix in…

asked Nov 07 '15 at 22:42

Bishwarup Bhattacharjee

vote

1 answer

Calculating AUC Leave-One-Out cross validation in mlR?

This is a quick question, just to make sure I'm not doing this the dumb way. I want to use auc as my measure in mlr, and I'm also using LOO due to the small sample size. Of course, in the LOO cross validation scheme the test sample is always only…

r classification cross-validation auc mlr

asked Oct 20 '15 at 20:16

catastrophic-failure

3,759
1
24
43

vote

1 answer

Compute model efficiency in a cross validation leave one subject out mode in R

I have a dataframe df structure(list(x = c(49, 50, 51, 52, 53, 54, 55, 56, 1, 2, 3, 4, 5, 14, 15, 16, 17, 163, 164, 165, 153, 154, 72, 38, 39, 40, 23, 13, 14, 15, 5, 6, 74, 75, 76, 77, 78, 79, 80, 81, 82, 127, 128, 129, 130, 131,…

r cross-validation nonlinear-functions

asked Oct 06 '15 at 11:33

SimonB

vote

2 answers

Compute Random Forest with a leave one ID out cross validation in R

I have a dataframe df dput(df) structure(list(ID = c(4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 8, 8, 8, 9, 9), Y = c(2268.14043972082, 2147.62290922552, 2269.1387550775, 2247.31983098201, 1903.39138268307, 2174.78291538358,…

r cross-validation random-forest

asked Oct 02 '15 at 14:43

SimonB

vote

0 answers

cv.glmnet Ridge Regression lambda.min = lambda.1se?

I'm currently running a ridge regression in R using the glmnet package, however, I recently ran into a new problem and was hoping for some help in interpreting my results. My data can be found here:…

r lambda regression cross-validation glmnet

asked Sep 28 '15 at 16:29

dwm8

vote

1 answer

What is the meaning of the GridSearchCV best_score_ attribute? (the value is different from the mean of the cross validation array)

I'm confused with the results, probably I'm not getting the concept of cross validation and GridSearch right. I had followed the logic behind this post:…

machine-learning scikit-learn decision-tree cross-validation grid-search

asked Sep 17 '15 at 14:26

Pablo Fleurquin

vote

1 answer

10 fold cross validation with sample size that is not a factor of 10

I see papers that use 10-fold cross validation on data sets that have a number of samples indivisible by 10. I couldn't find any case where they explained how they chose each subset. My assumption is that they use resampling to some extent, but if…

machine-learning cross-validation

asked Aug 31 '15 at 07:08

zacdav

4,603
2
16
37

vote

1 answer

SciKit Learn feature selection and cross validation using RFECV

I am still very new to machine learning and trying to figure things out myself. I am using SciKit learn and have a data set of tweets with around 20,000 features (n_features=20,000). So far I achieved a precision, recall and f1 score of around 79%.…

machine-learning scikit-learn cross-validation feature-selection naivebayes

asked Aug 20 '15 at 04:55

cirnelle

vote

0 answers

cross validation matlab toolbox issue

Labels=[1; 0]; k=5; groups = Labels; cvFolds = crossvalind('Kfold', groups, k); I am getting error of no bio informatics toolbox. Is there a way I could rewrite this function without using crossvalind?

matlab bioinformatics cross-validation toolbox

asked Aug 04 '15 at 18:46

shr m

vote

1 answer

WEKA cross validation discretization

I'm trying to improve the accuracy of my WEKA model by applying an unsupervised discretize filter. I need to decided on the number of bins and whether equal frequency binning should be used. Normally, I would optimize this using a training set.…

weka cross-validation discretization

asked Aug 03 '15 at 07:39

user3197231

vote

1 answer

How to precompute foldid with even observations per fold for glmnet

According to the glmnet vignette, a foldid can be set up by: foldid=sample(1:10,size=length(y),replace=TRUE) However, if you look at the number of observations in each of the folds: > table(foldid) foldid 1 2 3 4 5 6 7 8 9 10 10 12 8 7…

r cross-validation glmnet

asked Jul 30 '15 at 21:20

fumikos

vote

1 answer

Is there a discrepancy between createMultiFolds behavior and the resampling summary of a caret object?

I encountered a strange issue using custom folds for the cross-validation with caret. A MWE (in which the use of createMultiFolds doesn't really make sense) library(caret) #version 6.0-47 data(iris) set.seed(1) train.idx <-…

r cross-validation r-caret

asked Jul 29 '15 at 08:22

JeromeLaurent

vote

2 answers

Multiple cross-validation + testing on a small dataset to improve confidence

I am currently working on a very small dataset of about 25 samples (200 features) and I need to perform model selection and also have a reliable classification accuracy. I was planning to split the dataset in a training set (for a 4-fold CV) and a…

machine-learning cross-validation

asked Jul 19 '15 at 17:23

lcit

vote

0 answers

Obtain ROC curve in cross-validation of Logistic Regression in MATLAB

I'm trying calculate the ROC curve of a cross-validation. In particular, the parameter AUC (Area under the curve) and OPTROCPT (Optimal ROC Point). I thing I can calculate them by averaging the AUC and th OptROCPt of each iteration, but I didn't get…

matlab cross-validation roc auc

asked Jul 18 '15 at 13:07

Frank

vote

0 answers

How to fit a GLM to a dataset estimating "only the post hoc values for the random effects"?

My goal is to implement a cross-validation procedure for linear mixed models. Let me start with what I want to do (which is described here), and already tell you that I get stuck at step 4. The goal: Fit a GLM to the data with one subject removed…

r cross-validation lme4 mixed-models

asked Jun 30 '15 at 11:38

JBJ

Prev 1 2 3

…

99 100 Next