Questions tagged [cross-validation]

Cross-Validation is a method of evaluating and comparing predictive systems in statistics and machine learning.

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model.

In typical cross-validation, the training and validation sets must cross-over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation.

Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation.

2604 questions
1
vote
0 answers

WEKA SMOreg classifier significance testing

I am using the SMOreg classifier in WEKA to determine if there is a predictive relationship between one variable and several other variables. I am using 10-fold cross-validation to get my results. My teacher wants me to find the confidence of my…
1
vote
1 answer

Creating a classifier in MATLAB to be used with classperf

I'm working on a new model and would like to use classperf to check the performance of my classifier. How do I make it use my classifier as opposed to one of the built-in ones? All the examples I found online use classifiers that are included in…
Leo Jweda
  • 2,481
  • 3
  • 23
  • 34
1
vote
1 answer

File For Weka "Cross Validation"

I have three values that are associated to a specific file. Each file belongs to a specific family. I need to improve a "cross validation" in Weka to understand if with these three values i'm able to identify the family. Now which are the steps to…
paolo2988
  • 857
  • 3
  • 15
  • 31
1
vote
1 answer

Is it possible to compare the classification ability of two sets of features by ROC?

I am learning about SVM and ROC. As I know, people can usually use a ROC(receiver operating characteristic) curve to show classification ability of a SVM (Support Vector Machine). I am wondering if I can use the same concept to compare two subsets…
Cassie
  • 1,179
  • 6
  • 18
  • 30
1
vote
0 answers

Weka cross validation wrong results

I am classifying 5 minutes of EEG data of 4 classes using a Bayesian Network. When applying cross validation I get 100% correct results whereas when I use training and supplied testing data (the first 3.7 minutes for training, 1.3 minutes for…
Mariam H
  • 171
  • 1
  • 3
  • 11
1
vote
0 answers

How can we obtain the accuracy of each fold separately with the '-v' option in liblinear or livsvm?

In liblinear and libsvm, the option -v k allows us to run k-fold cross validation. But to test statistical significance, I need the accuracy obtained on each fold. Of course there is the long drawn way to create each fold and then run train and test…
Chthonic Project
  • 8,216
  • 1
  • 43
  • 92
0
votes
0 answers

Residual standard deviation out of Caret Cross validation

I am using the folowing codes to train a xgboost model: caret::trainControl( method = "repeatedcv", # cross-validation number = 5, # with n folds  repeats = 1, p = 0.6, #index = createFolds(tr_treated$Id_clean), # fix…
0
votes
0 answers

Comparison between DenseNet and GCN models shows abnormally high performance for GCN - Request for data leakage validation

Content: Hello. I am attempting a binary classification task for Alzheimer's Disease (AD) and Mild Cognitive Impairment (MCI) using 3D grayscale PET brain images with pytorch. Data: Data from 282 patients (158 with AD, 124 with MCI) and the…
0
votes
0 answers

Linear regression returning perfect values on metrics even in cross-validation

I'm building an API using sklearn's algorithms RandomForestRegressor, DecisionTreeRegressor, SVR, LinearRegression, KNeighborsRegressor. Among all the models, the one that achieved the best predictions was the LinearRegression model, as shown…
Liam Park
  • 414
  • 1
  • 9
  • 26
0
votes
0 answers

Problem running time series cross validation with horzion=1 for Lasso and Elastic Net

I am trying to do hyperparameter tuning for Lasso and Elastic Net using R caret's train function. I want to have horizon=1 to validate each fold with only one validation datapoint. myTimeControl <- trainControl(method = "timeslice", …
0
votes
0 answers

Optimize code using RepeatedStratifiedKFold

I'm running the following code: import numpy as np import pandas as pd from sklearn.dummy import DummyClassifier from sklearn.model_selection import RepeatedStratifiedKFold, train_test_split from sklearn.metrics import roc_curve,…
0
votes
1 answer

Is this a proper cross-validation code with the Leave-One-Group-Out method?

Though the below code “works” (in that it does not give an error), I get very high AUCs which makes me wonder if it somehow skips over the actual type of cross-validation I am trying to make it conduct. Each group indicates the collection of data…
0
votes
0 answers

How to interpret tuning mtry parameter in random forest model when using PCA preprocessing (caret package)

I am trying to run random forest model with crossvalidation after using preprocess with PCA with caret package. I am predicting two classes (variable dg) using 381 parameters and I have 100 observations. I was expecting that after preprocessing the…
Matyas K.
  • 1
  • 1
0
votes
1 answer

How to get out training f1 and recall scores from GridsearchCV?

My aim here is to create a pipeline to handle preprocessing and to do nested cross valiation to prevent information leak. I'm making one pipeline per model and then will compare the performances and pick the best model. Questions: I can get out the…
0
votes
0 answers

regression tree and penalty parameter

I have the following data and I need to create a regression tree in R by using rpart library to predict the rental duration. The output of the structure is : - data.frame: 1000 obs. of 6 variables: - $ rental_duration : int 6 3 7 5 6 3 6 6 3 6…
Alex
  • 67
  • 1
  • 8