Loss equals nan in training neural network with yolo custom loss function?

Question

I made my custom yolo loss function, essentially same as the one here https://github.com/Neerajj9/Text-Detection-using-Yolo-Algorithm-in-keras-tensorflow/blob/master/Yolo.ipynb

While training, it shows a loss of nan. Why is it so?

def yolo_loss_function(y_true,y_pred):
    #y_true,y_pred:None,16,16,1,5
    l_coords = 5.0
    l_noob = 0.5
    coords = y_true[:,:,:,:,0]*l_coords
    noobs = (-1*(y_true[:,:,:,:,0]-1)*l_noob)
    p_pred = y_pred[:,:,:,:,0]  #probability that theer is text or not
    p_true = y_true[:,:,:,:,0]  #Always 1 or 0
    x_true = y_true[:,:,:,:,1]
    x_pred = y_pred[:,:,:,:,1]
    yy_true = y_true[:,:,:,:,2]
    yy_pred = y_pred[:,:,:,:,2]
    w_true = y_true[:,:,:,:,3]
    w_pred = y_pred[:,:,:,:,3]
    h_true = y_true[:,:,:,:,4]
    h_pred = y_pred[:,:,:,:,4]

    #We have different loss value depending on whether text is present or not
    p_loss_absent = K.sum(K.square(p_pred-p_true)*noobs)
    p_loss_present = K.sum(K.square(p_pred-p_true))

    x_loss = K.sum(K.square(x_pred-x_true)*coords)
    yy_loss = K.sum(K.square(yy_pred-yy_true)*coords)
    xy_loss = x_loss + yy_loss

    w_loss = K.sum(K.square(K.sqrt(w_pred)-K.sqrt(w_true))*coords)
    h_loss = K.sum(K.square(K.sqrt(h_pred)-K.sqrt(h_true))*coords)
    wh_loss = w_loss+h_loss

    loss = p_loss_present+p_loss_absent + xy_loss + wh_loss

    return loss

#optimizer
opt = Adam(lr=0.0001,beta_1=0.9,beta_2=0.999,epsilon=1e-08,decay=0.0)

#checkpoint
checkpoint = ModelCheckpoint('model/text_detect.h5',monitor='val_loss',verbose =1,save_best_only=True,mode='min',period=1)

model.compile(loss=yolo_loss_function,optimizer=opt,metrics=['accuracy'])

I'm using transfer learning using the MobileNetV2 architecture.

P.S. - Loss goes to NAN when training the custom YOLO model As in this, I tried removing sqrt from my loss function. That removed the nan but my loss does not decrease. It is increasing steadily and stays constant at about 6 then. The answer at the above post does not seem to help as I cannot see "division" by 0 anywhere.

Edit:

def yolo_model(input_shape):


    inp = Input(input_shape)

    model = MobileNetV2( input_tensor= inp , include_top=False, weights='imagenet')
    last_layer = model.output

    conv = Conv2D(512,(3,3) , activation='relu' , padding='same')(last_layer)
    conv = Dropout(0.4)(conv)
    bn = BatchNormalization()(conv)
    lr = LeakyReLU(alpha=0.1)(bn)


    conv = Conv2D(128,(3,3) , activation='relu' , padding='same')(lr)
    conv = Dropout(0.4)(conv)
    bn = BatchNormalization()(conv)
    lr = LeakyReLU(alpha=0.1)(bn)


    conv = Conv2D(5,(3,3) , activation='sigmoid' , padding='same')(lr)

    final = Reshape((grid_h,grid_w,classes,info))(conv)

    model = Model(inp,final)

    return model

I'm uploading what my model is. The activation in the last Conv2D layer was relu which I changed to sigmoid in response to an answer. Also, my image is normalised from (-1,1). After 1st Epoch , my program showed loss:nan accuracy:1.0000 and below that there was a line could not bring down loss from inf.

score 0 · Answer 1 · answered Apr 22 '20 at 23:37

0

You are using relu in last layer, which is not expected. This may be causing dying gradients.

In original yolo paper, the co-ordinates are bounded meaning co-ordinates, height, widths are normalized in range (0,1). So, maybe get rid of relu and try linear or sigmoid.

model.add(Conv2D(7,(3,3),padding="same"))
model.add(Activation("relu"))

adam = optimizers.adam(lr=0.001)
model.compile(loss=custom_loss,optimizer=adam,metrics=["accuracy"])

answered Apr 22 '20 at 23:37

Zabir Al Nazi

10,298
4
33
60

this didn't help either – user185887 Apr 22 '20 at 23:59
What did you try? Would be more useful if you show some metrics. – Zabir Al Nazi Apr 23 '20 at 00:03
I have edited the question. Do let me know if you need more information. – user185887 Apr 23 '20 at 00:07

Loss equals nan in training neural network with yolo custom loss function?

1 Answers1