Can someone walk me through how to build a biLSTM model for multiclass classification (7 classes) using text data? the data is from a kaggle competition (https://www.kaggle.com/datasets/rmisra/news-category-dataset). I have labelled it into 7 categories, and then used embeddings to get the arrays of the following shapes:
label_dict = {'CRIME':0, 'BUSINESS':1, 'SPORTS':2 ,'WEDDINGS':3, 'DIVORCE':4, 'PARENTING':5}
df['label'] = df['category'].map(label_dict).fillna(6).astype(int)
X_train data shape - (171812, 384)
y_train data shape - (171812,)
X_test data shape - (37715, 384)
y_test data shape - (37715,)
I am trying to build a biLSTM model,
# parameters
DENSE1_DIM = 64
DENSE2_DIM = 32
LSTM1_DIM = 32
LSTM2_DIM = 16
WD = 0.001
FILTERS = 64
input_dim= 10000
output_dim =128
max_length =384
# Model Definition
model_lstm = tf.keras.Sequential([
tf.keras.layers.Embedding(input_dim, output_dim, input_length=max_length),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(LSTM1_DIM, dropout=0.2, kernel_regularizer = regularizers.l2(WD), return_sequences=True)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(LSTM2_DIM, dropout=0.2, kernel_regularizer = regularizers.l2(WD), return_sequences=True)),
tf.keras.layers.Dense(DENSE1_DIM, activation='relu', kernel_regularizer = regularizers.l2(WD)),
tf.keras.layers.Dense(DENSE2_DIM, activation='relu'),
tf.keras.layers.Dense(7, activation='softmax')
])
# Set the training parameters
model_lstm.compile(loss='categorical_crossentropy',
optimizer=tf.keras.optimizers.Adam(),
# metrics=[tf.keras.metrics.Accuracy()])
metrics = [tfa.metrics.F1Score(average="macro", threshold=None,num_classes=7, name='f1_score', dtype=None)])
model_lstm.summary()
Model: "sequential_20"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_18 (Embedding) (None, 384, 128) 1280000
bidirectional_30 (Bidirecti (None, 384, 64) 41216
onal)
dense_56 (Dense) (None, 384, 64) 4160
dense_57 (Dense) (None, 384, 32) 2080
dense_58 (Dense) (None, 384, 7) 231
=================================================================
Then I try to train it, and get the error, ValueError: Shapes (None, 1) and (None, 384, 7) are incompatible.
history = model_lstm.fit(X_train, y_train,
epochs=epochs,
validation_data=(X_test, y_test),
batch_size=batch_size)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[70], line 4
1 epochs = 12
2 batch_size = 250
----> 4 history = model_lstm.fit(X_train, y_train,
5 epochs=epochs,
6 validation_data=(X_test, y_test),
7 batch_size=batch_size)
File ~\anaconda3\lib\site-packages\keras\utils\traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
67 filtered_tb = _process_traceback_frames(e.__traceback__)
68 # To get the full stack trace, call:
69 # `tf.debugging.disable_traceback_filtering()`
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb
File ~\AppData\Local\Temp\__autograph_generated_file1v7jvx9c.py:15, in outer_factory.<locals>.inner_factory.<locals>.tf__train_function(iterator)
13 try:
14 do_return = True
---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
16 except:
17 do_return = False
ValueError: Shapes (None, 1) and (None, 384, 7) are incompatible.
Can someone explain me in simple words how I can do the shapes correctly for a biLSTM model with my data? I do not quite understand where Shapes(None, 1) comes from.