I will briefly explain my problem and the approaches I have tested so far.
I have a movie dataset and I am trying to predict 17 genres based on 4 columns (about actors, plot, content, reviews).
My target variable looks like this,
y_train=array([[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0],
[0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
# Could be a problem that they are not float32 but int32?
y_test=array([[0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
As you can an array of boolean value may have up to three positive values. My current implementation has the following configuration:
Activation function of output layer: Sigmoid
Loss function: Binary_crossentropy
Metric function: Accuracy (binary since the loss function is binary croessentropy)
The results were very promising with 0.98 Accuracy level and 0.003 loss on training and validation dataset.
Learning curves
No signs of overfit or underfit.
However, I thought that such very well fitted accuracy is due to the fact of many negative values. And the algorithm can predict very well the 0s and thus it achieves such high accuracy.
So I tried the following trials
1st trial
Activation function of output layer: Sigmoid
Loss function: categorical_cross_entropy
Metric function: categorical_accuracy
The results are much worse. Very high accuracy and a totally unrepresentative validation dataset with many spikes.
2nd trial
Activation function of output layer: Sigmoid
Loss function: sigmoid_focal_loss (link)
Metric function: categorical_accuracy
Way better loss improvement, with accuracy still being in a bad range of values. So I came to the conclusion that categorical accuracy is not my option.
3rd trial ( I changed categorical accuracy to AUC)
Activation function of output layer: Sigmoid
Loss function: sigmoid_focal_loss (link)
Metric function: tf.keras.metrics.AUC(multi_label=True)
3rd trial results on test dataset (movies never seen before by the neural network classifier)
"Test Score (evalution of the model's loss/error on the test sequences): 0.026287764310836792"
"Test Accuracy (evalution of the model's auc on the test sequences): 0.99942547082901"
Based on the results of each trial is still valid to assume that the model's metric is affected by the imbalance between 0, 1 target values? or the neural network with Adam optimizer is robust and generalized? I would like you to write your opinions on this matter.
[UPDATE]
Based on the comments, it was recommended to add class_weights produced the following error:
class_weights={0:1.0, 1:0.29}
Does Keras have any bug with the class weights argument?
Thanks a lot in advance.
[UPDATE] - 11.07.2020
I have decided to follow this plan:
Activation function of output layer: Sigmoid
Loss function: binary_crossentropy
Metric function: f1_score
I don't want to use the Accuracy metric since this is not an appropriate metric for classification with lots of negative classes compared to positive classes.
My model.compile() method looks like this
model_for_pruning.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=[tfa.metrics.F1Score(y_train[0].shape[-1], average=None)])
However, I have a hard time to choose between F1_score micro, or the simple F1 score, since my data are multi-label. Based on my intuition micro average is more appropriate for multi-labeled data, but since I use sigmoid and binary_crossentropy I believe that no averaging shall be done in F1 score. Thus, I tried to put sample weights on my classes.
from sklearn.utils.class_weight import compute_sample_weight
class_weights_sample = compute_sample_weight('balanced',
y_train)
fitted_model=model_for_pruning.fit([X_train_seq_actors, X_train_seq_plot, X_train_seq_features, X_train_seq_reviews],
y_train,
steps_per_epoch=int(np.ceil((X_train_seq_actors.shape[0]*optimizer_parameters['validation_split_ratio'])//hparams[HP_HIDDEN_UNITS])),
epochs=fit_parameters["epoch"],
batch_size=hparams[HP_HIDDEN_UNITS],
validation_split=fit_parameters['validation_data_ratio'],
callbacks=callbacks,
use_multiprocessing=True,
sample_weight=class_weights_sample
)
Is this a typical correct approach or I miss something. Please note that I am asking about the approach validity and not if the code is running or not, because everything runs successfully.