Questions tagged [cross-validation]

Cross-Validation is a method of evaluating and comparing predictive systems in statistics and machine learning.

Cross-Validation is a statistical method of evaluating and comparing learning algorithms by dividing data into two segments: one used to learn or train a model and the other used to validate the model.

In typical cross-validation, the training and validation sets must cross-over in successive rounds such that each data point has a chance of being validated against. The basic form of cross-validation is k-fold cross-validation.

Other forms of cross-validation are special cases of k-fold cross-validation or involve repeated rounds of k-fold cross-validation.

2604 questions
0
votes
0 answers

How to use learning curves and cross-validation?

My aim is to prove whether there is overfitting or underfitting. However, when I calculate the learning curves (graphically depict how a process is improved), the standard deviation of the cross-validation score is enormous. My observation here is…
vdu16
  • 123
  • 10
0
votes
0 answers

How to plot the ROC curve for the mean fold of each class in a multiclass classification

I evaluate the performance of a random forest using 5 cross-validations on a multiclass classification. The curve that I get is like the picture enter image description here the code i use is as follows cv=StratifiedKFold(n_splits=5) classifier =…
0
votes
0 answers

Confusing results when running the quanteda.classifiers::crossval function

I have been trying to use the following code to run the integrated quanteda crossval function. The code works but the results look really strange to me in the sense that they differ a lot from what I receive when I implement a cross-validation loop…
0
votes
0 answers

How to use the commonly used wrapper for models from statsmodels to apply cross-validation?

I read the relevant discussion here: Using statsmodel estimations with scikit-learn cross validation, is it possible? In the discussion from the link it is advised to use a wrapper for models from statsmodels such that the cross_val_score function…
Xtiaan
  • 252
  • 1
  • 11
0
votes
0 answers

Whats the correct way to format X and Y from binnary dataframe to use on Stratified K-Fold cross-validation

My data is an dataframe of a table of 25 columns and 2737 rows containg binnary data. The goal is to train using each row as an INPUT and get as an OUTPUT a probabilistic prediction of what the next sequence could be. Data on this scenario is always…
Wisdom
  • 121
  • 1
  • 1
  • 13
0
votes
0 answers

Getting unnacurate number of rows when using predict function in a cross validation excercise

I'm Performing a K-fold exercise with K = 10 for polinomials from degree 1 to 5 with the purpose of identifying which polynomial fits the best the data provided. Never the less, when I try to predict Y-Hat using the testing data (X-test) which…
Lucpi
  • 1
0
votes
0 answers

Custom cross-validation for Ridge in sklearn

I have written the following algorithm to implement a Ridge regression and estimate its parameter via cross validation. In particular, I wanted to achieve the following: For the purpose of cross-validation, the train set is divided into 10 folds.…
NC520
  • 346
  • 3
  • 13
0
votes
0 answers

Bootstrapping the uncertainty on an RMSE estimate of a location-scale generalized additive model

I have height data (numeric height data in cm; Height) of plants measured over time (numeric data expressed in days of the year; Doy). These data is grouped per genotype (factor data; Genotype) and individual plant (Factor data; Individual). I've…
0
votes
0 answers

Unexpected behaviour (inflated results on random-data) in scikit-learn with nested cross-validation

When trying to train/evaluate a support vector machine in scikit-learn, I am experiencing some unexpected behaviour and I am wondering whether I am doing something wrong or that this is a possible bug. In a very specific subset of circumstances,…
0
votes
0 answers

About Sklearn double cross validation with wrapper feature_selection

About Double-CV or Nested-CV. The simplest example would be from sklearn.model_selection import cross_val_score, GridSearchCV from sklearn.ensemble import RandomForestRegressor from sklearn.pipeline import Pipeline gcv =…
x H
  • 11
  • 3
0
votes
0 answers

Implemention of early stopping with gradient descent

I am developing an algorithm based on gradient descent and I would like to add early stoping regularization. I have an objectif function,F, and I minimize it with respect to W. This is given in the code below: Data : X_Train, Y_Train t=1; while (t…
0
votes
0 answers

Troubles with Cross-Validation

I have some troubles to implement cross-validation. I understand that after cross-validation I have to re-train the model but I have the next doubts: Do train_test split before cross validation and use X_train and y_train for cross-validation…
0
votes
1 answer

Error in newdata[, object$model.list$variables] : subscript out of bounds"

When I am running this code, I am getting this error "Error in newdata[, object$model.list$variables] : subscript out of bounds" I am not getting how to solve…
linta
  • 15
  • 3
0
votes
0 answers

Plot training metrics from multiple cross validation folds in tensorflow

I'm closely following code from this tutorial for my data and its trianing nicely: https://www.tensorflow.org/tutorials/structured_data/imbalanced_data#class_weights The only key difference I've made (other than dataset) is that I perform k-fold…
0
votes
1 answer

scikit-learn cross_validate: reveal test set indices

In sklearn.model_selection.cross_validate , is there a way to output the samples / indices which were used as test set by the CV splitter for each fold?
roble
  • 304
  • 1
  • 8
1 2 3
99
100