Keras multi-class classification loss is too high

Question

I am training a model in multi class classification to generate texts. Below is a sample of the dataset.

state	district	month	rainfall	max_temp	min_temp	max_rh	min_rh	wind_speed	advice
Orissa	Kendrapada	february	0.0	34.6	19.4	88.2	29.6	12.0	chances of foot rot disease in paddy crop; apply urea at 3 weeks after transplanting at active tillering stage for paddy;......
Jharkhand	Saraikela Kharsawan	february	0	35.2	16.6	29.4	11.2	3.6	provide straw mulch and go for intercultural operations to avoid moisture losses from soil; chance of leaf blight disease in potato crop; .......

Below is my code through which the model is made.

def create_model():
    input1 = tf.keras.layers.Input(shape=(1,), name='state')
    input2 = tf.keras.layers.Input(shape=(1,), name='district')
    input3 = tf.keras.layers.Input(shape=(1,), name='month')
    input4 = tf.keras.layers.Input(shape=(1,), name='rainfall')
    input5 = tf.keras.layers.Input(shape=(1,), name='max_temp')
    input6 = tf.keras.layers.Input(shape=(1,), name='min_temp')
    input7 = tf.keras.layers.Input(shape=(1,), name='max_rh')
    input8 = tf.keras.layers.Input(shape=(1,), name='min_rh')
    input9 = tf.keras.layers.Input(shape=(1,), name='wind_speed')
    xz= [input1, input2, input3, input4, input5, input6, input7, input8, input9]
    x1= layers.Dense(128, activation='relu')(input1)
    x2=layers.Dense(128, activation='relu')(input2)
    x3=layers.Dense(128, activation='relu')(input3)
    x4=layers.Dense(128, activation='relu')(input4)
    x5=layers.Dense(128, activation='relu')(input5)
    x6=layers.Dense(128, activation='relu')(input6)
    x7=layers.Dense(128, activation='relu')(input7)
    x8=layers.Dense(128, activation='relu')(input8)
    x9=layers.Dense(128, activation='relu')(input9)
    base_model =  layers.Add()([x1,x2, x3, x4, x5, x6, x7, x8, x9])
    first_output = layers.Dense(30, name='output_1')(base_model) 
    second_output = layers.Dense(30, name='output_2')(base_model)
    third_output = layers.Dense(30, name='output_3')(base_model)
    fourth_output = layers.Dense(30, name='output_4')(base_model)
    fifth_output = layers.Dense(30, name='output_5')(base_model)
    models = tf.keras.Model(inputs=xz,
                  outputs=[first_output, second_output, third_output, fourth_output, fifth_output])
    return models

The code for my model compilation.

model=create_model()
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
model.compile(optimizer=optimizer,
              loss={'output_1': 'categorical_crossentropy', 
                    'output_2': 'categorical_crossentropy',
                    'output_3': 'categorical_crossentropy',
                    'output_4': 'categorical_crossentropy',
                    'output_5': 'categorical_crossentropy'},
              metrics={'output_1':tf.keras.metrics.Accuracy(),
                       'output_2':tf.keras.metrics.Accuracy(),
                       'output_3':tf.keras.metrics.Accuracy(),
                       'output_4':tf.keras.metrics.Accuracy(),
                       'output_5':tf.keras.metrics.Accuracy()})

Finally, the problem I am facing, the loss and accuracy. Loss is too high.

Epoch 499/500
2/2 [==============================] - 0s 11ms/step - loss: 66362.0130 - output_1_loss: 5827.9458 - output_2_loss: 10478.4935 - output_3_loss: 16566.5957 - output_4_loss: 16831.8887 - output_5_loss: 16657.0967 - output_1_accuracy: 0.0000e+00 - output_2_accuracy: 0.0000e+00 - output_3_accuracy: 0.0000e+00 - output_4_accuracy: 0.0000e+00 - output_5_accuracy: 0.0000e+00
Epoch 500/500
2/2 [==============================] - 0s 11ms/step - loss: 66362.0130 - output_1_loss: 5827.9458 - output_2_loss: 10478.4935 - output_3_loss: 16566.5957 - output_4_loss: 16831.8887 - output_5_loss: 16657.0967 - output_1_accuracy: 0.0000e+00 - output_2_accuracy: 0.0000e+00 - output_3_accuracy: 0.0000e+00 - output_4_accuracy: 0.0000e+00 - output_5_accuracy: 0.0000e+00

Kindly help me and correct me where I am wrong. I am total newbie to this field.

Alternative Model Update

model = tf.keras.Sequential([
  feature_layer,
  layers.Dense(128, activation='relu'),
  layers.Dense(128, activation='relu'),
  layers.Dropout(.1),
  layers.Dense(150),
])
opt = Adam(learning_rate=0.01)
model.compile(optimizer=opt,
              loss='mean_squared_error',
              metrics=['accuracy'])

It have the [5,30] shaped input reshaped to [150].

What classes are you trying to predict? Only possible classes I see are `state`, `district` and `month`. — yudhiesh, Apr 23 '21 at 03:44
have you tried to reduce the learning rate? maybe it is case of exploding gradient — Vinson Ciawandy, Apr 23 '21 at 03:53
The classes I am trying to predict are the ```advices``` which are in shape of [5,30]. Actually further in my code I had separated the [5,30] single column into 5 colums each with a tensor of shape [30]. — Santosh Kumar, Apr 23 '21 at 04:10
@yudhiesh Well, no they are not one hot encoded. I used the Keras text preprocessing's Tokenizer and pad_sequences. — Santosh Kumar, Apr 23 '21 at 04:14
Your code would be a lot more readable if you just passed 9 inputs into your model and had one dense layer of size 128, and then passed this into a sum. It would perform the exact same operation but be a bit less confusing. — jhso, Apr 23 '21 at 04:14
@yudhiesh Here is how they are encoded. ``` array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 54, 55, 21, 56, 57, 3, 22, 19, 58, 6, 59, 4, 60, 1, 61, 62, 23, 63, 23, 64], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 65, 7, 66, 2, 67, 68, 3, 69, 70].....] dtype=int32)``` — Santosh Kumar, Apr 23 '21 at 04:15
@jhso Well, i get a little what you ar trying to say. I am posting an update to this qus at bottom, please see if that is what you are referring to. — Santosh Kumar, Apr 23 '21 at 04:17
@santhosh sure. for more intuitive explanation, check this https://machinelearningmastery.com/exploding-gradients-in-neural-networks/#:~:text=Exploding%20gradients%20are%20a%20problem,learn%20from%20your%20training%20data. for a bit math introduction, go to here https://www.youtube.com/watch?v=qhXZsFVxGKo. — Vinson Ciawandy, Apr 23 '21 at 04:20
Please do **not** post code or data in the comments - edit & update your question instead. — desertnaut, Apr 23 '21 at 10:01
Check that data range in input is between 0-1, and if not (I do not see that in the code parts shown), make the preprocessing correspondingly. This may help partially your problem. — Experience_In_AI, Apr 23 '21 at 12:12
@Experience_In_AI I did normalized the inputs. Well that got my loss lower. But my accuracy is still varying between 0.1- 0.3 — Santosh Kumar, Apr 24 '21 at 06:31
@SantoshKumar next try to change your "Add" principle to form of "Concatenate" to optimize proper infromation flow inside the network. I'll add the this idea as an separate answer. — Experience_In_AI, Apr 26 '21 at 04:20

score 0 · Answer 1 · answered Apr 26 '21 at 04:33

To enhance the model structure please see the following example code, including a "model_simple" alternative for the original network. Train the both with the same input data, vary the structure of the "model_simple" and find out what structure results in the best accuracy.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers


def create_model():
    input1 = tf.keras.layers.Input(shape=(1,), name='state')
    input2 = tf.keras.layers.Input(shape=(1,), name='district')
    input3 = tf.keras.layers.Input(shape=(1,), name='month')
    input4 = tf.keras.layers.Input(shape=(1,), name='rainfall')
    input5 = tf.keras.layers.Input(shape=(1,), name='max_temp')
    input6 = tf.keras.layers.Input(shape=(1,), name='min_temp')
    input7 = tf.keras.layers.Input(shape=(1,), name='max_rh')
    input8 = tf.keras.layers.Input(shape=(1,), name='min_rh')
    input9 = tf.keras.layers.Input(shape=(1,), name='wind_speed')
    xz= [input1,input2,input3,input4,input5,input6,input7,input8,input9]
    x1= layers.Dense(128, activation='relu')(input1)
    x2=layers.Dense(128, activation='relu')(input2)
    x3=layers.Dense(128, activation='relu')(input3)
    x4=layers.Dense(128, activation='relu')(input4)
    x5=layers.Dense(128, activation='relu')(input5)
    x6=layers.Dense(128, activation='relu')(input6)
    x7=layers.Dense(128, activation='relu')(input7)
    x8=layers.Dense(128, activation='relu')(input8)
    x9=layers.Dense(128, activation='relu')(input9)
    base_model =  layers.Add()([x1,x2, x3, x4, x5, x6, x7, x8, x9])
    first_output = layers.Dense(30,name='output_1')(base_model)
    second_output = layers.Dense(30,name='output_2')(base_model)
    third_output= layers.Dense(30,name='output_3')(base_model)
    fourth_output= layers.Dense(30,name='output_4')(base_model)
    fifth_output = layers.Dense(30,name='output_5')(base_model)
    models = tf.keras.Model(inputs=xz,
                  outputs=[first_output,second_output,third_output,fourth_output,fifth_output])
    return models

def create_model_simple():
    input1 = tf.keras.layers.Input(shape=(1,), name='state')
    input2 = tf.keras.layers.Input(shape=(1,), name='district')
    input3 = tf.keras.layers.Input(shape=(1,), name='month')
    input4 = tf.keras.layers.Input(shape=(1,), name='rainfall')
    input5 = tf.keras.layers.Input(shape=(1,), name='max_temp')
    input6 = tf.keras.layers.Input(shape=(1,), name='min_temp')
    input7 = tf.keras.layers.Input(shape=(1,), name='max_rh')
    input8 = tf.keras.layers.Input(shape=(1,), name='min_rh')
    input9 = tf.keras.layers.Input(shape=(1,), name='wind_speed')
    #xz= [input1,input2,input3,input4,input5,input6,input7,input8,input9]
    #x1=layers.Dense(128, activation='relu')(input1)
    #x2=layers.Dense(128, activation='relu')(input2)
    #x3=layers.Dense(128, activation='relu')(input3)
    #x4=layers.Dense(128, activation='relu')(input4)
    #x5=layers.Dense(128, activation='relu')(input5)
    #x6=layers.Dense(128, activation='relu')(input6)
    #x7=layers.Dense(128, activation='relu')(input7)
    #x8=layers.Dense(128, activation='relu')(input8)
    #x9=layers.Dense(128, activation='relu')(input9)
    yhdistelma=layers.concatenate([input1,input2, input3, input4, input5, input6, input7, input8, input9])
    #base_model =  layers.Add()([x1,x2, x3, x4, x5, x6, x7, x8, x9])
    first_output = layers.Dense(30,name='output_1')(yhdistelma)
    second_output = layers.Dense(30,name='output_2')(yhdistelma)
    third_output= layers.Dense(30,name='output_3')(yhdistelma)
    fourth_output= layers.Dense(30,name='output_4')(yhdistelma)
    fifth_output = layers.Dense(30,name='output_5')(yhdistelma)
    models = tf.keras.Model(inputs=[input1,input2,input3,input4,input5, input6, input7, input8, input9],
                  outputs=[first_output,second_output,third_output,fourth_output,fifth_output])
    return models

model=create_model()
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
model.compile(optimizer=optimizer,
              loss={'output_1': 'categorical_crossentropy',
                    'output_2': 'categorical_crossentropy',
                    'output_3': 'categorical_crossentropy',
                    'output_4': 'categorical_crossentropy',
                    'output_5': 'categorical_crossentropy'},
              metrics={'output_1':tf.keras.metrics.Accuracy(),
                       'output_2':tf.keras.metrics.Accuracy(),
                       'output_3':tf.keras.metrics.Accuracy(),
                       'output_4':tf.keras.metrics.Accuracy(),
                       'output_5':tf.keras.metrics.Accuracy()})

model.summary()

keras.utils.plot_model(model,'model_structure.png',show_dtype=True)


#Let's create a more simple model version:
model_simple=create_model_simple()

model.compile(optimizer=optimizer,
              loss={'output_1': 'categorical_crossentropy',
                    'output_2': 'categorical_crossentropy',
                    'output_3': 'categorical_crossentropy',
                    'output_4': 'categorical_crossentropy',
                    'output_5': 'categorical_crossentropy'},
              metrics={'output_1':tf.keras.metrics.Accuracy(),
                       'output_2':tf.keras.metrics.Accuracy(),
                       'output_3':tf.keras.metrics.Accuracy(),
                       'output_4':tf.keras.metrics.Accuracy(),
                       'output_5':tf.keras.metrics.Accuracy()})

model_simple.summary()

keras.utils.plot_model(model_simple,'model_simple_structure.png',show_dtype=True)

...especially, please note that the key difference between your original and more simple model is that "Add" has been replaced with "Concatenate". The "Add" results in output size of same than one of its inputs, but the size of "Concatenate" output is much much higher, that kind of things may have an effect for the performance.

That made the code much simpler to understand. Can I add LSTM to each output instead of a single Dense? — Santosh Kumar, Apr 28 '21 at 04:31

Keras multi-class classification loss is too high

1 Answers1