I am new to data science. Please bear with me as I ask this long question. I am trying to do Speech Emotion Recognition with MLPCLassifier on RAVDESS and Crema datasets. I am getting high validation loss and large training-validation accuracy gap.
I am predicting only three emotion labels. I have 80, 10, 10 train-val-test split ratio and 189 features. Train set size: 3510 Validation set size: 439 Test set size: 439
After splitting I have preprocessed all the training data to same duration. I have extracted these features: mfcc, chroma, mel, contrast, zcr, rms. Then standardized all training samples with StandardScaler. I have done the same steps with validation and testing data separately before using them.
I have done hyperparameter tuning like this:
model_params = {
'alpha': 0.01,
'early_stopping': True
}
model = MLPClassifier(**model_params)
param_grid = {
'batch_size': [32, 64],
'hidden_layer_sizes': [(100), (200), (200, 200), (300)],
'max_iter': [50, 100, 200]
}
grid_search = GridSearchCV(estimator=model,
param_grid=param_grid,
scoring='accuracy',
refit=False,
cv=3,
verbose=4,
return_train_score=True)
grid_search.fit(X_train, y_train)
which determined these best_params:
{'batch_size': 32, 'hidden_layer_sizes': 200, 'max_iter': 50}
The mean_train_score and mean_test_score of cv_results_ look like this:
https://i.stack.imgur.com/6YwB6.png[![]()](https://i.stack.imgur.com/6YwB6.png)
I have again trained the model on the entire training set.
train_loss_history = []
train_accuracy_history = []
val_loss_history = []
val_accuracy_history = []
for epoch in range(model_params['max_iter']):
model.partial_fit(X_train, y_train, classes=np.unique(y_train))
# loss=model.loss_
# train_loss_history.append(loss)
train_probs = model.predict_proba(X_train)
train_loss = log_loss(y_train, train_probs)
train_loss_history.append(train_loss)
val_probs = model.predict_proba(X_val)
val_loss = log_loss(y_val, val_probs)
val_loss_history.append(val_loss)
train_accuracy=model.score(X_train, y_train)
train_accuracy_history.append(train_accuracy)
val_accuracy=model.score(X_val, y_val)
val_accuracy_history.append(val_accuracy)
print(f"Epoch {epoch + 1}/{model_params['max_iter']}: "
f"Loss={train_loss:.4f}, Accuracy={val_accuracy:.4f}")
The plot for training and validation per epoch looks like this:
https://i.stack.imgur.com/w1LB3.png![]()
The validation loss is increasing and the gap of accuracies is large. How do I prevent the overfitting?
On the test_set, this is my evaluation report:
Predicted Labels Actual Labels
0 happy sad
1 angry angry
2 angry angry
3 angry angry
4 happy happy
5 angry angry
6 happy happy
7 sad sad
8 sad happy
9 happy happy
Classification Report:
precision recall f1-score support
angry 0.80 0.79 0.80 154
happy 0.70 0.66 0.68 146
sad 0.81 0.88 0.84 139
accuracy 0.77 439
macro avg 0.77 0.78 0.77 439
weighted avg 0.77 0.77 0.77 439
Accuracy: 77.45%
Log Loss: 0.79
F1 Score: 77.34%
Precision: 77.22%
Recall: 77.58%