4

Learning to use bert-base-cased and a classification model... the code for the model is the following:

def mao_func(input_ids, masks, labels):
return {'input_ids':input_ids, 'attention_mask':masks}, labels

dataset = dataset.map(mao_func)

BATCH_SIZE = 32
dataset = dataset.shuffle(100000).batch(BATCH_SIZE)

split = .8
ds_len = len(list(dataset))

train = dataset.take(round(ds_len * split))
val = dataset.skip(round(ds_len * split))

from transformers import TFAutoModel
bert = TFAutoModel.from_pretrained('bert-base-cased')

Model: "tf_bert_model"


Layer (type) Output Shape Param #

bert (TFBertMainLayer) multiple 108310272

================================================================= Total params: 108,310,272 Trainable params: 108,310,272 Non-trainable params: 0

then the NN builduing:

input_ids = tf.keras.layers.Input(shape=(50,), name='input_ids', dtype='int32')
mask = tf.keras.layers.Input(shape=(50,), name='attention_mask', dtype='int32')

embeddings = bert(input_ids, attention_mask=mask)[0]

X = tf.keras.layers.GlobalMaxPool1D()(embeddings)
X = tf.keras.layers.BatchNormalization()(X)
X = tf.keras.layers.Dense(128, activation='relu')(X)
X = tf.keras.layers.Dropout(0.1)(X)
X = tf.keras.layers.Dense(32, activation='relu')(X)
y = tf.keras.layers.Dense(3, activation='softmax',name='outputs')(X)

model = tf.keras.Model(inputs=[input_ids, mask], outputs=y)

model.layers[2].trainable = False

the model.summary is:

 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_ids (InputLayer)         [(None, 50)]         0           []                               
                                                                                                  
 attention_mask (InputLayer)    [(None, 50)]         0           []                               
                                                                                                  
 tf_bert_model (TFBertModel)    TFBaseModelOutputWi  108310272   ['input_ids[0][0]',              
                                thPoolingAndCrossAt               'attention_mask[0][0]']         
                                tentions(last_hidde                                               
                                n_state=(None, 50,                                                
                                768),                                                             
                                 pooler_output=(Non                                               
                                e, 768),                                                          
                                 past_key_values=No                                               
                                ne, hidden_states=N                                               
                                one, attentions=Non                                               
                                e, cross_attentions                                               
                                =None)                                                            
                                                                                                  
 global_max_pooling1d (GlobalMa  (None, 768)         0           ['tf_bert_model[0][0]']          
 xPooling1D)                                                                                      
                                                                                                  
 batch_normalization (BatchNorm  (None, 768)         3072        ['global_max_pooling1d[0][0]']   
 alization)                                                                                       
                                                                                                  
 dense (Dense)                  (None, 128)          98432       ['batch_normalization[0][0]']    
                                                                                                  
 dropout_37 (Dropout)           (None, 128)          0           ['dense[0][0]']                  
                                                                                                  
 dense_1 (Dense)                (None, 32)           4128        ['dropout_37[0][0]']             
                                                                                                  
 outputs (Dense)                (None, 3)            99          ['dense_1[0][0]']                
                                                                                                  
==================================================================================================
Total params: 108,416,003
Trainable params: 104,195
Non-trainable params: 108,311,808
__________________________________________________________________________________________________

finally the model fitting is

optimizer = tf.keras.optimizers.Adam(0.01)
loss = tf.keras.losses.CategoricalCrossentropy()
acc = tf.keras.metrics.CategoricalAccuracy('accuracy')

model.compile(optimizer,loss=loss, metrics=[acc])

history = model.fit(
    train,
    validation_data = val,
     epochs=140
)

with execution error in line 7 -> the model.fit(...):

ValueError: Input 0 of layer "model" is incompatible with the layer: expected shape=(None, 50), found shape=(None, 1, 512)

Can any one be so kind of helping me on what I did wrong and why... thanks:)

update: here is the git with the codes https://github.com/CharlieArreola/OnlinePosts

1 Answers1

4

It seems, that your shape of the train data doen't match the expected input shape of your input layer. You can check your shape of the train data with train.shape()

You input layer Input_ids = tf.keras.layers.Input(shape=(50,), name='input_ids', dtype='int32') expects train data with 50 columns, but you most likely have 512 if we look at your error. So to fix this, you could simply change your input shape.

Input_ids = tf.keras.layers.Input(shape=(512,), name='input_ids', dtype='int32')

If you split your x and y in your dataset you can make it more flexible with:

Input_ids = tf.keras.layers.Input(shape=(train_x.shape[0],), name='input_ids', dtype='int32')

Also don't forget, that you have to do this change to all of your input layers!

Fabian
  • 756
  • 5
  • 12
  • Hi Fabian! so I tried just changing the 50 col to 512 without splitting, still receiving an error "Input 0 of layer "model_1" is incompatible with the layer: expected shape=(None, 512), found shape=(None, 1, 512)" ... btw train.shape was not available in my tf so I did an iteration dataset_length = [i for i,_ in enumerate(train)][-1] + 1... the length is 3902. – CharlieArreola Jan 19 '22 at 15:24
  • mhh strange.. but without your dateset I can't reproduce and just guess. Well a quick fix/try would be to include the 1 also in your input layers `shape=(1,512)`. Let me know if this works. Else you might have to let mme know what dataset you are using ;) Another thing worth trying would be using a Flatten()-Layer before your input which should give you an Flatten Layer with an Output of (None,512) – Fabian Jan 19 '22 at 15:46
  • Just upload my 2 notebook and dataset here https://github.com/CharlieArreola/OnlinePosts (the input pipeline is where I generate the input_ids, masks annd labels – CharlieArreola Jan 19 '22 at 16:43
  • Will have a look tomorrow and update my answer :) – Fabian Jan 19 '22 at 17:32
  • Hi:) very friendly reminder for help over here ... sorry&thanks – CharlieArreola Jan 25 '22 at 02:15