0

Why

  • keras code of mean squared error use return mean -> 1 / N -> is N batchsize?
  • keras code of categorical crossentropy use return reduce_sum -> 1. I remember categorical crossentropy also need divided by batchsize.

Please explain this.

This is code of mean squared error:

def mean_squared_error(y_true, y_pred):
    return K.mean(K.square(y_pred - y_true), axis=-1)

This is code of categorical cross entropy:

def categorical_crossentropy(target, output, from_logits=False, axis=-1):
"""Categorical crossentropy between an output tensor and a target tensor.
# Arguments
    target: A tensor of the same shape as `output`.
    output: A tensor resulting from a softmax
        (unless `from_logits` is True, in which
        case `output` is expected to be the logits).
    from_logits: Boolean, whether `output` is the
        result of a softmax, or is a tensor of logits.
    axis: Int specifying the channels axis. `axis=-1`
        corresponds to data format `channels_last`,
        and `axis=1` corresponds to data format
        `channels_first`.
# Returns
    Output tensor.
# Raises
    ValueError: if `axis` is neither -1 nor one of
        the axes of `output`.
"""
output_dimensions = list(range(len(output.get_shape())))
if axis != -1 and axis not in output_dimensions:
    raise ValueError(
        '{}{}{}'.format(
            'Unexpected channels axis {}. '.format(axis),
            'Expected to be -1 or one of the axes of `output`, ',
            'which has {} dimensions.'.format(len(output.get_shape()))))
# Note: tf.nn.softmax_cross_entropy_with_logits
# expects logits, Keras expects probabilities.
if not from_logits:
    # scale preds so that the class probas of each sample sum to 1
    output /= tf.reduce_sum(output, axis, True)
    # manual computation of crossentropy
    _epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
    output = tf.clip_by_value(output, _epsilon, 1. - _epsilon)
    return - tf.reduce_sum(target * tf.log(output), axis)
else:
    return tf.nn.softmax_cross_entropy_with_logits(labels=target,
                                                   logits=output)
def categorical_crossentropy(y_true, y_pred):
    return K.categorical_crossentropy(y_true, y_pred)
hlcp
  • 37
  • 4
  • That averaging is not over the batch; rather it's over all the outputs of the model for **each input sample** (because the output might be a vector instead of a single number). See [this answer](https://stackoverflow.com/a/52173844/2099607). – today Sep 03 '19 at 15:26
  • Possible duplicate of [loss calculation over different batch sizes in keras](https://stackoverflow.com/questions/52172859/loss-calculation-over-different-batch-sizes-in-keras) – today Sep 03 '19 at 15:27

0 Answers0