Which loss function to use in Keras Sequential Model

Question

I am using a Keras sequential model, a prediction output is of the shape (1, 5) (5 features).

I have an accuracy metric defined as follows:

For N predictions, the accuracy of the model will be the percentage of predicted samples such that: for each prediction and its respective true labels, all of the features are with no more than 10 difference.

For example, if y_i = [1, 2, 3, 4, 5] and ypred_i = [1, 2, 3, 4, 16] is not a match since the last feature has difference 11. If y_i = [1, 2, 3, 4, 5] and ypred_i = [10, 8, 0, 5, 7] is a match, because all features have no more than 10 difference to its respective real features.

I am wondering which loss function to use in my Keras sequential model as to increase the explained accuracy the most. Should I define a custom loss function, how should it look like, or how should I proceed?

My code is:

class NeuralNetMulti(Regressor):
    def __init__(self):
        self.name = 'keras-sequential'
        self.model = Sequential()
        # self.earlystopping = callbacks.EarlyStopping(monitor="mae",
        #                                              mode="min", patience=5,
        #                                              restore_best_weights=True)

    def fit(self, X, y):
        print('Fitting into the neural net...')
        n_inputs = X.shape[1]
        n_outputs = y.shape[1]
        self.model.add(Dense(400, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu'))
        # self.model.add(Dense(20, activation='relu'))
        self.model.add(Dense(200, activation='relu'))
        # self.model.add(Dense(10, activation='relu'))
        self.model.add(Dense(n_outputs))
        self.model.summary()
        self.model.compile(loss='mae', optimizer='adam', metrics=['mse', 'mae', 'accuracy'])
        history = self.model.fit(X, y, verbose=1, epochs=200, validation_split=0.1)
        # self.model.fit(X, y, verbose=1, epochs=1000, callbacks=[self.earlystopping])
        print('Fitting completed!')

    def predict(self, X):
        print('Predicting...')
        predictions = self.model.predict(X, verbose=1)
        print('Predicted!')
        return predictions

My suggestion for a loss function:

def N_distance(y_true, y_pred):
    score = 0
    vals = abs(y_true - y_pred)
    if all(a <= 10 for a in vals):
            return 0
return 1

It returns:

0 if the condition holds
1 otherwise.

Note that with the current approach (using mae function) I achieve 55% accuracy (55% of the samples comply to the condition mentioned). But the rmse I got is like 7.5, which is very good, considering that the output features range from 0 to 100, and I feel that I just need to use a more appropriate loss functiion. — Petar, Jun 25 '21 at 08:21
If you suggest a method, I will use it and run it and see how it performs. — Petar, Jun 25 '21 at 08:26
Maybe helpful to try with ``Huber loss`` with threshold 10. https://www.tensorflow.org/api_docs/python/tf/keras/losses/Huber — Kaveh, Jun 25 '21 at 10:34
So set self.model.compile(loss=tf.keras.losses.Huber(delta=10), optimizer='adam', metrics=['mse', 'mae', 'accuracy']) ? — Petar, Jun 25 '21 at 10:45
I think Huber loss is in the opposite of your objective. Huber loss is less sensitive to outliers, since it returns quadratic loss, for error below threshold, but linear for outliers. If you are more sensitive to outliers (above10), get an inspiration from huber loss and implement it in reverse. — Kaveh, Jun 25 '21 at 10:46
I have not faced Huber loss before and I am not sure how to implement something that is reverse to it. — Petar, Jun 25 '21 at 11:34
``def my_huber_loss_with_threshold(threshold): def my_huber_loss(y_true, y_pred): error = y_true - y_pred is_big_error = tf.abs(error) >= threshold big_error_loss = tf.square(error) / 2 small_error_loss = threshold * (tf.abs(error) - (0.5 * threshold)) return tf.where(is_big_error, big_error_loss, small_error_loss) return my_huber_loss`` — Kaveh, Jun 25 '21 at 11:43
Its a regression task why are you using accuracy as a metric? — yudhiesh, Jun 25 '21 at 11:45
Well the task is to predict each one of the five output features. They are integers between 0 and 100. I am required to use the metric that I mentioned. It's not up to me. I also think it's better to use rmse, for example — Petar, Jun 25 '21 at 12:35

Paloha · Accepted Answer · 2021-06-29T21:29:01.510

First of all, your loss needs to be differentiable so that it is possible to compute the gradient with respect to the weights. Then it is possible to use the gradient to optimize the weights which is the whole point of gradient based optimization algorithms like Gradient Descent. If you write your own loss, this is the first thing you need to keep in mind. This is why your loss does not work. You need to rethink your loss or the whole problem.

Next, do not forget, you need to use keras or tensorflow functions in your loss, so the used functions have the gradient defined and the chain rule can be applied. Using just abs() is not a good idea. This question might point you to the right direction https://ai.stackexchange.com/questions/26426/why-is-tf-abs-non-differentiable-in-tensorflow.

Furthermore, from your question and comments I see the expected output should be between 0 and 100. In that case, I would try to scale the output and the labels of the network such that they always lie in that range. There are multiple ways you can go about it. Divide your labels by 100 and either use sigmoid activation on the outputs and or check e.g. this answer How to restrict output of a neural net to a specific range?.

Then you can start thinking how to write your loss. From your description it is not clear what would happen in this case: y_i = [1, 2, 3, 4, 100] and pred = [1, 2, 3, 4, 110]. Is the value 110 still acceptable even though it should not be possible in theory?

Anyways, you can just use mae or mse as a loss. Your network would try to fit perfectly and then you can use your special non-differentiable function just as a metric to measure how well is your network trained according to your rules.

An explicit example:

The last layer of your network needs to have an activation specified like so self.model.add(Dense(n_outputs, activation='sigmoid')) which will scale all the network output to the interval from 0 to 1.
Since your labels are defined on an interval from 0 - 100, you just need to divide your labels to also be in the interval from 0 to 1 before using them in the network by y \= 100.
Then you can use mae or mse as a loss and your special function just as a metric. self.model.compile(loss='mae', optimizer='adam', metrics=[custom_metric])

The custom_metric function can look like this:

def custom_metric(y_true, y_pred):
    valid_distance = 0.1
    valid = tf.abs(y_true - y_pred) <= valid_distance
    return tf.reduce_mean(tf.cast(tf.reduce_all(valid, axis=1), tf.float32))

Okay, so use MSE function. And yes, 110 is not a valid value. I tried restricting the output but didn't get much improvement. Any other ideas? — Petar, Jun 28 '21 at 07:02
Have you restricted the output of your network using `sigmoid` activation function while also divided your labels by 100 and while using `MSE` as a loss? In this case it should learn. If it does not, the problem will be probably somewhere else. — Paloha, Jun 28 '21 at 13:29
Where should I put the sigmoid function? Also, what do you mean by divide labels by 100 while using MSE? Could you help me with this? :) — Petar, Jun 28 '21 at 20:25
Hey, Paloha! Thank you for your updated answer. The network now outputs values in the range (0, 100) (because of the sigmoid). Does that mean I should also scale down the test values? Like, if I have [5, 20, 13, 61, 99], it should be [0.05, 0.2, 0.13, 0.61, 0.99], or should I just scale up the predictions (just multiply them by 100?) — Petar, Jun 29 '21 at 10:01
I already answered this in the second bullet point in the "Explicit example". You should divide to have everything in the same range. — Paloha, Jun 29 '21 at 10:07
Oh, you mean just compare them when they are between 0 and 1? So if I have 0.4 and 0.6, they distance is then 20? right? — Petar, Jun 29 '21 at 10:16
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/234327/discussion-between-petar-and-paloha). — Petar, Jun 29 '21 at 10:17
metrics = [N_distance] gives error: OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature. — Petar, Jun 29 '21 at 13:52
I have yet again updated the answer. Please mark the answer as correct if it resolved your problem described in the question. — Paloha, Jun 29 '21 at 21:35

Which loss function to use in Keras Sequential Model

1 Answers1

Linked

Related