Questions tagged [k-fold]

A technique in cross-validation where the data is partitioned into k subsets (or "folds"), where the first k-1 folds are used for training and the last fold for evaluation. The process is repeated k times, leaving out a different fold for evaluation each time.

284 questions
2
votes
0 answers

How to k-fold cross validate with 3D continuous data?

I'm training an LSTM network for timeseries regression using tensorflow. I've got 400+ data sets (3 inputs, 1 target) which I've sliced into 20 sample long windows. My training data is hence an and input and target numpy array of shape (no. of…
2
votes
2 answers

How to use K-Fold Cross Validation For This CNN?

I have tried to implement K Fold Cross Validation for my binary image classifier, but I have been struggling for a while as I have been stuck with the whole data processing side of things. I have included my code below (it is quite long and messy -…
Joel
  • 63
  • 1
  • 7
2
votes
1 answer

How to combine Scikit Learn's GroupKFold and StratifieKFold

I am working with an imbalanced data set that has multiple observations from the same set of users. I want to make sure that I don't have the same users in both the training and test sets while still maintaining the original distribution as much as…
slax
  • 21
  • 1
2
votes
3 answers

Difference between GroupSplitShuffle and GroupKFolds

As the title says, I want to know the difference between sklearn's GroupKFold and GroupShuffleSplit. Both make train-test splits given for data that has a group ID, so the groups don't get separated in the split. I checked on one train/test set for…
amestrian
  • 546
  • 3
  • 12
2
votes
1 answer

Weights&Biases Sweep Keras K-Fold Validation

I'm using Weights&Biases Cloud-based sweeps with Keras. So first i create a new Sweep within a W&B Project with a config like following: description: LSTM Model method: random metric: goal: maximize name: val_accuracy name:…
Ragnar
  • 45
  • 7
2
votes
2 answers

Data reshaping for Keras not working with K-Fold Validation

I have a dataset which which I am shaping for a Keras network as follows: scaler.fit(X) X_Scaled = pd.DataFrame(scaler.transform(X.values), columns=X.columns, index=X.index) X_Scaled.info() X_data = X_Scaled.values X_data =…
SDROB
  • 125
  • 2
  • 14
2
votes
1 answer

Does cross_val_score not fit the actual input model?

I am working on a project in which I am dealing with a large dataset. I need to train the SVM classifier within the KFold cross-validation library from Sklearn. import pandas as pd from sklearn import svm from sklearn.metrics import…
2
votes
1 answer

Convert SpatialPolygonsDataFrame to projected coordinates using spTransform

Im trying to do a point pattern analysis. To do this I have to convert a SpatialPolygonsDataFrame so it contains projected coordinates instead of curved coordinates. However I keep getting the same error: Error in…
Sara
  • 33
  • 3
2
votes
2 answers

how to learn from each fold in the k-fold cross validation?

When performing k-fold cross-validation, for every fold, we have a different validation set and a slightly changed learning set. Say that you progress from the first fold to the second fold. How is what υοu learned from the first fold being…
smaillis
  • 298
  • 3
  • 12
2
votes
1 answer

Unsure about the purpose of get_n_splits and why it is necessary

I'm following a kernel on Kaggle and came across this code. #Validation function n_folds = 5 def rmsle_cv(model): kf = KFold(n_folds, shuffle=True, random_state=42).get_n_splits(train.values) rmse= np.sqrt(-cross_val_score(model,…
apang
  • 93
  • 1
  • 12
2
votes
1 answer

How to create Training Sets for K-Fold Cross Validation without ski-kit learn?

I have a data set that has 95 rows and 9 columns and want to do a 5-fold cross-validation. In the training, the first 8 columns (features) are used to predict the ninth column. My test sets are correct, but my x training set is of size (4,19,9) when…
shreya17
  • 33
  • 1
  • 5
2
votes
0 answers

Why does my Random Forest Classifier perform better on test and validation data than on training data?

I'm currently training a random forest on some data I have and I'm finding that the model performs better on the validation set, and even better on the test set, than on the train set. Here are some details of what I'm doing - please let me know if…
Maks
  • 21
  • 1
2
votes
2 answers

10-fold cross-validation and obtaining RMSE

I'm trying to compare the RMSE I have from performing multiple linear regression upon the full data set, to that of 10-fold cross validation, using the KFold module in scikit learn. I found some code that I tried to adapt but I can't get it to work…
2
votes
1 answer

Kfold, cross_val_score: on the basis of what data the output is shown (sklearn wrapper)?

I can't understand the output of kfold_results = cross_val_score(xg_cl, X_train, y_train, cv=kfold, scoring='roc_auc') The output of xgb.cv is clear - there are the train and test scores: [0] train-auc:0.927637+0.00405497 …
Alex Ivanov
  • 657
  • 1
  • 8
  • 17
2
votes
1 answer

Get individual model scores at every iteration / fold in k-fold validation

I am trying to perform kfold validation in scala. I am using a random forest model and rmse as an evaluator. I can get the rmse values only for the best model. Code: val rf = new…
Shashank BR
  • 65
  • 1
  • 6
1 2
3
18 19