I am implementing a CNN coupled to a multiple instance learning layer. In brief, I've got this, with C the number of categories:
[1 batch of images, 1 label] > CNN > Custom final layer -> [1 vector of size C]
My final layer just sums up the previous layer for the moment. To be clear, 1 batch of inputs only gives 1 single ouput. The batch corresponds therefore to multiple instances fetched in 1 single bag associated to 1 label.
When I train my model and validate it with the same set:
history = model.fit_generator(
generator=training_generator,
steps_per_epoch=training_set.batch_count,
epochs=max_epoch,
validation_data=training_generator
validation_steps=training_set.batch_count)
I've got 2 different results betwen the training and the validation sets, in spite of being the same:
35/35 [==============================] - 30s 843ms/step - loss: 1.9647 - acc: 0.2857 - val_loss: 1.9403 - val_acc: 0.3714
The loss function is just the categorical cross entropy as implemented in Keras (I've got 3 categories). I have implemented my own loss function to get some insight about what happens. Unfortunately, I obtain another inconsistency between the regular loss and my custom loss function:
35/35 [==============================] - 30s 843ms/step - loss: 1.9647 - acc: 0.2857 - bag_loss: 1.1035 - val_loss: 1.9403 - val_acc: 0.3714 - val_bag_loss: 1.0874
My loss function:
def bag_loss(y_true, y_predicted):
y_true_mean = keras.backend.mean(y_true, axis=0, keepdims=False)
y_predicted_mean = keras.backend.mean(y_predicted, axis=0, keepdims=False)
loss = keras.losses.categorical_crossentropy(y_true_mean, y_predicted_mean)
return loss
The final layer of my model (I only shown the call part, for concision):
def call(self, x):
x = kb.sum(x, axis=0, keepdims=True)
x = kb.dot(x, self.kernel)
x = kb.bias_add(x, self.bias)
out = kb.sigmoid(x)
return out
After inspecting the code with TensorBoard and the TensorFlow Debugger, I have found out that, indeed, my beg loss and the regular loss return the same value at somme point. But then, Keras perform 6 supplemental additions on the regular sigmoid loss (1 for each layer in my model). Can someone help my to entangle this ball of surprising results? I expect the regular loss, the validation loss and my bag loss to be the same.