How to do Multi-Label Classification with Tensorflow / Keras?

Question

Currently I am working on a Text classification Model to assign one of 2 Labels to each Post.

As an Example:

"Hey how are you doing" | Approve
"You are really dumb" | Disapprove

Either the model approves or disapproves a post based on toxicity or something like that.

Now I would like to add another layer of labels, that specifys the reason why a post should be disapproved.

As an Example:

"You are really dumb" | Disapprove | Flaming
"You can buy this here" | Disapprove | Advertisement
"Hey you are cool" | Approve

So now I wonder how can I implement a Multi Layer Label Classification to my current code?

Right now my Training Data (data.csv) looks like this, I Split the text and each label with a ³ charackter:

"thank you for the good idea"³Approve
"hallo wie geht es dir heute"³Foreign Language³Disapprove

My current code looks like that:

# Load data
def load_data(file_path):
    with open(file_path, 'r', encoding='utf-8') as f:
        lines = f.readlines()
    posts, labels = [], []
    for line in lines:
        post, label = line.strip().split('³')
        posts.append(post)
        labels.append(label)
    return posts, labels

train_posts, train_labels = load_data('mixed_train_data.csv')
test_posts, test_labels = load_data('mixed_test_data.csv')

# Label mapping
label_to_index = { 
    'Approve': 0,
    'Disapprove': 1
}

# Tokenization and Padding
tokenizer = Tokenizer()
tokenizer.fit_on_texts(train_posts)
sequences = tokenizer.texts_to_sequences(train_posts)
max_sequence_length = 7500
X = pad_sequences(sequences, maxlen=max_sequence_length)
y = np.array([label_to_index[label] for label in train_labels])

test_sequences = tokenizer.texts_to_sequences(test_posts)
test_X = pad_sequences(test_sequences, maxlen=max_sequence_length)
test_y = np.array([label_to_index[label] for label in test_labels])

# Model architecture
model = Sequential([
    Embedding(input_dim=len(tokenizer.word_index), output_dim=50, input_length=max_sequence_length),
    Flatten(),
    Dense(128, activation='relu'),
    Dense(2, activation='softmax')
])

# Model compilation
learning_rate = 0.001
model.compile(loss='sparse_categorical_crossentropy', optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate), metrics=['accuracy'])

# Model training
model.fit(X, y, epochs=100, batch_size=32, validation_data=(test_X, test_y), callbacks=[checkpoint_callback])

I could need some help to update it for the multi label classification since I don't know where to start.

You need to change the size of your output to match the number of classes: Dense(num_classes, activation='softmax' — user2586955, Aug 20 '23 at 17:47
@user2586955 yes I know that, currently it is at 2 for Approve and Disapprove, but when I add more layers I increase the number, I just dont know what else to change and where to start with the code. — Menrion, Aug 20 '23 at 17:50
This has been asked and answered multiple times, see the duplicate question, and please search the site before asking. — Dr. Snoopy, Aug 20 '23 at 21:35

How to do Multi-Label Classification with Tensorflow / Keras?

0 Answers0