I want to understand kfold more clearly and how to choose the best model after it is implemented as a cross-validation method.
According to this source: https://machinelearningmastery.com/k-fold-cross-validation/
the steps to carry out kfold are:
- Shuffle the dataset randomly
- Split the dataset into k groups
For each unique group:
Take the group as a hold out or test data set
Take the remaining groups as a training data set
Fit a model on the training set and evaluate it on the test set
Retain the evaluation score and discard the model
4.Summarize the skill of the model using the sample of model evaluation scores
However, I have a question in relation to this process.
what is Retain the evaluation score and discard the model supposed to mean? how do you do it?
After my research, I believe it may have to do with the sklearn method cross_val_score()
, but when I try to implement it, by passing my model
to it, it throws the next error:
Traceback (most recent call last):
File "D:\ProgramData\Miniconda3\envs\Env_DLexp1\lib\site-packages\joblib\parallel.py", line 797, in dispatch_one_batch tasks = self._ready_batches.get(block=False) _queue.Empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\temporary.py", line 187, in <module>
scores = cross_val_score(model, X_test, y_test, cv=kf,scoring="accuracy")
File "D:\ProgramData\Miniconda3\envs\Env_DLexp1\lib\site-packages\sklearn\model_selection\_validation.py", line 390, in cross_val_score
error_score=error_score)
File "D:\ProgramData\Miniconda3\envs\Env_DLexp1\lib\site-packages\sklearn\model_selection\_validation.py", line 236, in cross_validate
for train, test in cv.split(X, y, groups))
File "D:\ProgramData\Miniconda3\envs\Env_DLexp1\lib\site-packages\joblib\parallel.py", line 1004, in __call__
if self.dispatch_one_batch(iterator):
File "D:\ProgramData\Miniconda3\envs\Env_DLexp1\lib\site-packages\joblib\parallel.py", line 808, in dispatch_one_batch
islice = list(itertools.islice(iterator, big_batch_size))
File "D:\ProgramData\Miniconda3\envs\Env_DLexp1\lib\site-packages\sklearn\model_selection\_validation.py", line 236, in <genexpr>
for train, test in cv.split(X, y, groups))
File "D:\ProgramData\Miniconda3\envs\Env_DLexp1\lib\site-packages\sklearn\base.py", line 67, in clone
% (repr(estimator), type(estimator)))
TypeError: Cannot clone object '<keras.engine.sequential.Sequential object at 0x00000267F9C851C8>' (type <class 'keras.engine.sequential.Sequential'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.
According to the documentation,https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html, the first argument for cross_val_score()
must be an estimator, which they define as "estimator object implementing ‘fit’.The object to use to fit the data."
Therefore, I can't understand the exception.
This is the relevant part of my code:
model = Sequential()
model.add(Embedding(max_words, embedding_dim, input_length=maxlen))
model.add(Conv1D(filters=32, kernel_size=8, activation='relu'))
model.add(BatchNormalization(weights=None, epsilon=1e-06, momentum=0.9))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(10, activation='relu'))
model.add(Dense(4, activation='softmax'))
print(model.summary())
from sklearn.metrics import precision_recall_fscore_support
from sklearn.model_selection import GridSearchCV,cross_val_score
kf = KFold(n_splits=4, random_state=None, shuffle=True)
print(kf)
for train_index, test_index in kf.split(data):
print("TRAIN:", train_index, "TEST:", test_index)
X_train, X_test = data[train_index], data[test_index]
y_train, y_test = labels[train_index], labels[test_index]
Adam=keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(optimizer=Adam,
loss='sparse_categorical_crossentropy',
metrics=['sparse_categorical_accuracy'])
history = model.fit(X_train, y_train,
epochs=15,
batch_size=32,
verbose=1,
callbacks=callbacks_list,
validation_data=(X_test, y_test)
)
scores = cross_val_score(model, X_test, y_test, cv=kf,scoring="accuracy")
print(scores)
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
I would appreciate any help you can give me. Please take into consideration I am not a data scientist or a developer.