I have a NLP/text data classification problem where there is a very skewed distribution - class 0 - 98%, class 1 - 2%
For my training and validation data I am doing oversampling and my class distribution is class 0 - 55%, class 1 - 45%
.
The test data has skewed distribution
i built a model using nn.BCEWithLogitsLoss(pos_weight=tensor(1.2579, device='cuda:0'))
. pos_weight
was calculated using 55/45
(class distribution in training data.)
and on my class 1 of test data I got f1
performance of 0.07
,
true negatives, false positives, false negative, true positive = (28809, 13258, 537, 495)
I changed to focal loss using below code and my performance didnt improve a lot. f1
on class 1 of test data is still same and
true negatives, false positives, false negative, true positive = (32527, 9540, 640, 392)
kornia.losses.binary_focal_loss_with_logits(probssss, labelsss,alpha=0.25,gamma=2.0,reduction='mean')
- are my alpha and gamma parameters wrong? Are there any specific values that I should try? I could try to tune them but it might take a lot of time and resources. therefore I am looking for recommendations
- for my
nn.BCEWithLogitsLoss(pos_weight=tensor(1.2579, device='cuda:0'))
should I use any other value forpos_weight
? Please remember that my goal is to get maximumf1
performance for test data class 1
#update
I am building a CNN using glove embedding - i take my text and find their glove embedding - i am removing all punctuation and apart from that no other major data cleaning. I am interested in tuning parameters of the focal loss - alpha and gamma
My model is as below
class CNN(nn.Module):
def __init__(self,
pretrained_embedding,
embed_dim,
filter_sizes,
num_filters,
fc1_neurons,
fc2_neurons,
dropout):
super(CNN, self).__init__()
# Embedding layer
self.vocab_size, self.embed_dim = pretrained_embedding.shape
self.embedding = nn.Embedding.from_pretrained(pretrained_embedding,
freeze=True)
# Conv Network
self.conv1d_list = nn.ModuleList([
nn.Conv1d(in_channels=self.embed_dim,
out_channels=num_filters[i],
kernel_size=filter_sizes[i])
for i in range(len(filter_sizes))
])
#Batchnorm
self.batch_norm1 = nn.BatchNorm1d(num_filters[0] * len(filter_sizes))
# Dropout Layer
self.dropout = nn.Dropout(p=dropout)
# RELU activation function
self.relu = nn.ReLU()
# Fully-connected layers
# self.fc1 = nn.Linear(np.sum(num_filters), fc1_neurons)
self.batch_norm2 = nn.BatchNorm1d(num_filters)
self.fc2 = nn.Linear(np.sum(num_filters), fc2_neurons)
self.batch_norm3 = nn.BatchNorm1d(fc2_neurons)
self.fc3 = nn.Linear(fc2_neurons, 1)