I am building a neural collaborative filtering recommendation model using tensorflow, using binary cross entropy as the loss function. The labels to be predicted are, of course, binary.
Upon training each epoch, the loss function is printed. I have a for loop that trains the model epoch by epoch, then uses the model at that current state to predict the test labels, and calculates the loss again using the log_loss function of sci-kit learn.
I notice that the loss calculated by tensorflow (shown by loss:) is consistently higher than that calculated by sklearn (shown by train_loss:):
Is this due to slightly different math involved in the two functions?