0

Can someone walk me through how to build a biLSTM model for multiclass classification (7 classes) using text data? the data is from a kaggle competition (https://www.kaggle.com/datasets/rmisra/news-category-dataset). I have labelled it into 7 categories, and then used embeddings to get the arrays of the following shapes:

label_dict = {'CRIME':0, 'BUSINESS':1, 'SPORTS':2 ,'WEDDINGS':3, 'DIVORCE':4, 'PARENTING':5}
        
df['label'] = df['category'].map(label_dict).fillna(6).astype(int)

X_train data shape - (171812, 384)
y_train data shape - (171812,)
X_test data shape - (37715, 384)
y_test data shape - (37715,)

I am trying to build a biLSTM model,

#    parameters
DENSE1_DIM = 64
DENSE2_DIM = 32
LSTM1_DIM = 32 
LSTM2_DIM = 16
WD = 0.001
FILTERS = 64

input_dim= 10000
output_dim =128
max_length =384

# Model Definition 
model_lstm = tf.keras.Sequential([
    tf.keras.layers.Embedding(input_dim, output_dim, input_length=max_length),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(LSTM1_DIM, dropout=0.2, kernel_regularizer = regularizers.l2(WD), return_sequences=True)), 
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(LSTM2_DIM, dropout=0.2, kernel_regularizer = regularizers.l2(WD), return_sequences=True)),
    tf.keras.layers.Dense(DENSE1_DIM, activation='relu', kernel_regularizer = regularizers.l2(WD)), 
    tf.keras.layers.Dense(DENSE2_DIM, activation='relu'),    
    tf.keras.layers.Dense(7, activation='softmax')
])

# Set the training parameters
model_lstm.compile(loss='categorical_crossentropy',
                   optimizer=tf.keras.optimizers.Adam(), 
#                   metrics=[tf.keras.metrics.Accuracy()])
                    
                   metrics = [tfa.metrics.F1Score(average="macro", threshold=None,num_classes=7, name='f1_score', dtype=None)])

model_lstm.summary()
Model: "sequential_20"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_18 (Embedding)    (None, 384, 128)          1280000   
                                                                 
 bidirectional_30 (Bidirecti  (None, 384, 64)          41216     
 onal)                                                           
                                                                 
 dense_56 (Dense)            (None, 384, 64)           4160      
                                                                 
 dense_57 (Dense)            (None, 384, 32)           2080      
                                                                 
 dense_58 (Dense)            (None, 384, 7)            231       
                                                                 
=================================================================

Then I try to train it, and get the error, ValueError: Shapes (None, 1) and (None, 384, 7) are incompatible.

history = model_lstm.fit(X_train, y_train,
          epochs=epochs,
          validation_data=(X_test, y_test),
          batch_size=batch_size)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[70], line 4
      1 epochs = 12
      2 batch_size = 250
----> 4 history = model_lstm.fit(X_train, y_train,
      5           epochs=epochs,
      6           validation_data=(X_test, y_test),
      7           batch_size=batch_size)

File ~\anaconda3\lib\site-packages\keras\utils\traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

File ~\AppData\Local\Temp\__autograph_generated_file1v7jvx9c.py:15, in outer_factory.<locals>.inner_factory.<locals>.tf__train_function(iterator)
     13 try:
     14     do_return = True
---> 15     retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
     16 except:
     17     do_return = False

ValueError: Shapes (None, 1) and (None, 384, 7) are incompatible.

Can someone explain me in simple words how I can do the shapes correctly for a biLSTM model with my data? I do not quite understand where Shapes(None, 1) comes from.

Bluetail
  • 1,093
  • 2
  • 13
  • 27

1 Answers1

0

The issue lies in your output layers and your labels. So (None, 1) shape means (BATCH_SIZE, 1) which as you can see is your y_train and y_test shapes and they are completely different than your output layer input shape, it expects labels to be (384, 7).

As I understood your problem: Firstly you need to do one-hot-encoding of your 7 labels to have y_train, test with last axis = 7. Secondly, you need to apply Flatten() layers after your BiLSTM to get rid of the timepoints axis (384).

y_train = tf.keras.utils.to_categorical(y_train, num_classes=7)
y_test = tf.keras.utils.to_categorical(y_test, num_classes=7)

model_lstm = tf.keras.Sequential([
    tf.keras.layers.Embedding(input_dim, output_dim, input_length=max_length),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(LSTM1_DIM, dropout=0.2, kernel_regularizer = regularizers.l2(WD), return_sequences=True)), 
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(LSTM2_DIM, dropout=0.2, kernel_regularizer = regularizers.l2(WD), return_sequences=True)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(DENSE1_DIM, activation='relu', kernel_regularizer = regularizers.l2(WD)), 
    tf.keras.layers.Dense(DENSE2_DIM, activation='relu'),    
    tf.keras.layers.Dense(7, activation='softmax')
])
user2586955
  • 309
  • 1
  • 5