A technique in cross-validation where the data is partitioned into k subsets (or "folds"), where the first k-1 folds are used for training and the last fold for evaluation. The process is repeated k times, leaving out a different fold for evaluation each time.
Questions tagged [k-fold]
284 questions
2
votes
0 answers
How to k-fold cross validate with 3D continuous data?
I'm training an LSTM network for timeseries regression using tensorflow. I've got 400+ data sets (3 inputs, 1 target) which I've sliced into 20 sample long windows.
My training data is hence an and input and target numpy array of shape (no. of…
2
votes
2 answers
How to use K-Fold Cross Validation For This CNN?
I have tried to implement K Fold Cross Validation for my binary image classifier, but I have been struggling for a while as I have been stuck with the whole data processing side of things. I have included my code below (it is quite long and messy -…

Joel
- 63
- 1
- 7
2
votes
1 answer
How to combine Scikit Learn's GroupKFold and StratifieKFold
I am working with an imbalanced data set that has multiple observations from the same set of users. I want to make sure that I don't have the same users in both the training and test sets while still maintaining the original distribution as much as…

slax
- 21
- 1
2
votes
3 answers
Difference between GroupSplitShuffle and GroupKFolds
As the title says, I want to know the difference between sklearn's GroupKFold and GroupShuffleSplit.
Both make train-test splits given for data that has a group ID, so the groups don't get separated in the split. I checked on one train/test set for…

amestrian
- 546
- 3
- 12
2
votes
1 answer
Weights&Biases Sweep Keras K-Fold Validation
I'm using Weights&Biases Cloud-based sweeps with Keras.
So first i create a new Sweep within a W&B Project with a config like following:
description: LSTM Model
method: random
metric:
goal: maximize
name: val_accuracy
name:…

Ragnar
- 45
- 7
2
votes
2 answers
Data reshaping for Keras not working with K-Fold Validation
I have a dataset which which I am shaping for a Keras network as follows:
scaler.fit(X)
X_Scaled = pd.DataFrame(scaler.transform(X.values), columns=X.columns, index=X.index)
X_Scaled.info()
X_data = X_Scaled.values
X_data =…

SDROB
- 125
- 2
- 14
2
votes
1 answer
Does cross_val_score not fit the actual input model?
I am working on a project in which I am dealing with a large dataset.
I need to train the SVM classifier within the KFold cross-validation library from Sklearn.
import pandas as pd
from sklearn import svm
from sklearn.metrics import…

Raj Rajeshwari Prasad
- 304
- 2
- 17
2
votes
1 answer
Convert SpatialPolygonsDataFrame to projected coordinates using spTransform
Im trying to do a point pattern analysis. To do this I have to convert a SpatialPolygonsDataFrame so it contains projected coordinates instead of curved coordinates. However I keep getting the same error:
Error in…

Sara
- 33
- 3
2
votes
2 answers
how to learn from each fold in the k-fold cross validation?
When performing k-fold cross-validation, for every fold, we have a different validation set and a slightly changed learning set. Say that you progress from the first fold to the second fold. How is what υοu learned from the first fold being…

smaillis
- 298
- 3
- 12
2
votes
1 answer
Unsure about the purpose of get_n_splits and why it is necessary
I'm following a kernel on Kaggle and came across this code.
#Validation function
n_folds = 5
def rmsle_cv(model):
kf = KFold(n_folds, shuffle=True, random_state=42).get_n_splits(train.values)
rmse= np.sqrt(-cross_val_score(model,…

apang
- 93
- 1
- 12
2
votes
1 answer
How to create Training Sets for K-Fold Cross Validation without ski-kit learn?
I have a data set that has 95 rows and 9 columns and want to do a 5-fold cross-validation. In the training, the first 8 columns (features) are used to predict the ninth column. My test sets are correct, but my x training set is of size (4,19,9) when…

shreya17
- 33
- 1
- 5
2
votes
0 answers
Why does my Random Forest Classifier perform better on test and validation data than on training data?
I'm currently training a random forest on some data I have and I'm finding that the model performs better on the validation set, and even better on the test set, than on the train set. Here are some details of what I'm doing - please let me know if…

Maks
- 21
- 1
2
votes
2 answers
10-fold cross-validation and obtaining RMSE
I'm trying to compare the RMSE I have from performing multiple linear regression upon the full data set, to that of 10-fold cross validation, using the KFold module in scikit learn. I found some code that I tried to adapt but I can't get it to work…

immaprogrammingnoob
- 167
- 2
- 13
2
votes
1 answer
Kfold, cross_val_score: on the basis of what data the output is shown (sklearn wrapper)?
I can't understand the output of
kfold_results = cross_val_score(xg_cl, X_train, y_train, cv=kfold, scoring='roc_auc')
The output of xgb.cv is clear - there are the train and test scores:
[0] train-auc:0.927637+0.00405497 …

Alex Ivanov
- 657
- 1
- 8
- 17
2
votes
1 answer
Get individual model scores at every iteration / fold in k-fold validation
I am trying to perform kfold validation in scala. I am using a random forest model and rmse as an evaluator. I can get the rmse values only for the best model.
Code:
val rf = new…

Shashank BR
- 65
- 1
- 6