Currently I am working on a Text classification Model to assign one of 2 Labels to each Post.
As an Example:
"Hey how are you doing" | Approve
"You are really dumb" | Disapprove
Either the model approves or disapproves a post based on toxicity or something like that.
Now I would like to add another layer of labels, that specifys the reason why a post should be disapproved.
As an Example:
"You are really dumb" | Disapprove | Flaming
"You can buy this here" | Disapprove | Advertisement
"Hey you are cool" | Approve
So now I wonder how can I implement a Multi Layer Label Classification to my current code?
Right now my Training Data (data.csv) looks like this, I Split the text and each label with a ³ charackter:
"thank you for the good idea"³Approve
"hallo wie geht es dir heute"³Foreign Language³Disapprove
My current code looks like that:
# Load data
def load_data(file_path):
with open(file_path, 'r', encoding='utf-8') as f:
lines = f.readlines()
posts, labels = [], []
for line in lines:
post, label = line.strip().split('³')
posts.append(post)
labels.append(label)
return posts, labels
train_posts, train_labels = load_data('mixed_train_data.csv')
test_posts, test_labels = load_data('mixed_test_data.csv')
# Label mapping
label_to_index = {
'Approve': 0,
'Disapprove': 1
}
# Tokenization and Padding
tokenizer = Tokenizer()
tokenizer.fit_on_texts(train_posts)
sequences = tokenizer.texts_to_sequences(train_posts)
max_sequence_length = 7500
X = pad_sequences(sequences, maxlen=max_sequence_length)
y = np.array([label_to_index[label] for label in train_labels])
test_sequences = tokenizer.texts_to_sequences(test_posts)
test_X = pad_sequences(test_sequences, maxlen=max_sequence_length)
test_y = np.array([label_to_index[label] for label in test_labels])
# Model architecture
model = Sequential([
Embedding(input_dim=len(tokenizer.word_index), output_dim=50, input_length=max_sequence_length),
Flatten(),
Dense(128, activation='relu'),
Dense(2, activation='softmax')
])
# Model compilation
learning_rate = 0.001
model.compile(loss='sparse_categorical_crossentropy', optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate), metrics=['accuracy'])
# Model training
model.fit(X, y, epochs=100, batch_size=32, validation_data=(test_X, test_y), callbacks=[checkpoint_callback])
I could need some help to update it for the multi label classification since I don't know where to start.