Dealing with highly imbalanced datasets using Tensorflow Dataset and Keras Tuner

Question

I have a highly imbalanced dataset (3% Yes, 87% No) of textual documents, containing a title and abstract feature. I have transformed these documents into tf.data.Dataset entities with padded batches. Now, I am trying to train this dataset using Deep Learning. With model.fit() in TensorFlow, you have the class_weights parameter to deal with class imbalance, however, I am seeking for the best parameters using keras-tuner library. In their hyperparameter tuners, they do not have such an option. Therefore, I am seeking other options for dealing with class imbalance.

Is there an option to use class weights in keras-tuner? To add, I am already using the precision@recall metric. I could also try a data resampling method, such as imblearn.over_sampling.SMOTE, but as this Kaggle post mentions:

It appears that SMOTE does not help improve the results. However, it makes the network learning faster. Moreover, there is one big problem, this method is not compatible larger datasets. You have to apply SMOTE on embedded sentences, which takes way too much memory.

score 1 · Answer 1 · answered Oct 13 '20 at 03:57

1

if you are looking for other methods to deal with imbalanced data, you may consider generating synthetic data using SMOTE or ADASYN package. This usually works. I see you have considered this as an option to explore.

answered Oct 13 '20 at 03:57

Praks

67
1
1
4

score 0 · Answer 2 · answered Oct 12 '20 at 10:13

0

You could change the evaluation metric to fbeta_scorer.(its weighted fscore)

Or if the dataset is large enough, you can try undersampling.

answered Oct 12 '20 at 10:13

Malvika

61
6

1

the tuner documentation states "Run the hyperparameter search. The arguments for the search method are the same as those used for tf.keras.model.fit". Did you try to use the class_weight parameter? – Gerry P Oct 12 '20 at 16:48
@GerryP can you share me the link of that doc? I cannot find whether `kt.Hyperband` can use the class_weight parameter – rvdinter Oct 13 '20 at 08:53

Dealing with highly imbalanced datasets using Tensorflow Dataset and Keras Tuner

2 Answers2