5

Hello I have training data with a lot of missing values in labels, where for example a single label can have the following values:

[nan, 0, 0, nan, 1, 0]

I would like to train a classification model that ignores the nan values. Currently I have filled the nan values with -1, and try to slice it. A mask does not work, because the categorical crossentropy still takes it into account

ix = tf.where(tf.not_equal(y_true, -1))
true = tf.gather(y_true, ix)
pred = tf.gather(y_pred, ix)
return keras.objectives.categorical_crossentropy(true, pred)

is what I've been able to come up with so far, but it errors with

InvalidArgumentError (see above for traceback): Incompatible shapes: [131] vs. [128]
         [[Node: mul_1 = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](Mean, _recv_dense_3_sample_weights_0/_13)]]

Does anyone have an idea on how to handle this?

mjarosie
  • 3,228
  • 2
  • 20
  • 31
CCXD
  • 51
  • 2
  • Do you have this `nans` only in `y`s or also in`x`? – Marcin Możejko Apr 03 '17 at 20:43
  • @MarcinMożejko Only in y, X is a completely filled matrix. – CCXD Apr 04 '17 at 09:36
  • So how about simply skipping `x`s with missing values? For `prediction` training they are useless. You may use them only for `pretrain`. – Marcin Możejko Apr 04 '17 at 09:37
  • @MarcinMożejko Because it is a multi-label classification. Every x has 1064 classes, but in the training set it is so sparse that a sample only has labels for ~100 out of 1064 classes at one time. – CCXD Apr 04 '17 at 12:00

1 Answers1

2

You could write a custom loss function and temporarily replace missing values with zeroes. Then after calculating cross entropy loss replace loss values in places in which the label was missing with zeroes.

import numpy as np
import tensorflow as tf

tf.enable_eager_execution()


def missing_values_cross_entropy_loss(y_true, y_pred):
    # We're adding a small epsilon value to prevent computing logarithm of 0 (consider y_hat == 0.0 or y_hat == 1.0).
    epsilon = tf.constant(1.0e-30, dtype=np.float32)

    # Check that there are no NaN values in predictions (neural network shouldn't output NaNs).
    y_pred = tf.debugging.assert_all_finite(y_pred, 'y_pred contains NaN')

    # Temporarily replace missing values with zeroes, storing the missing values mask for later.
    y_true_not_nan_mask = tf.logical_not(tf.math.is_nan(y_true))
    y_true_nan_replaced = tf.where(tf.math.is_nan(y_true), tf.zeros_like(y_true), y_true)

    # Cross entropy, but split into multiple lines for readability:
    # y * log(y_hat)
    positive_predictions_cross_entropy = y_true_nan_replaced * tf.math.log(y_pred + epsilon)
    # (1 - y) * log(1 - y_hat)
    negative_predictions_cross_entropy = (1.0 - y_true_nan_replaced) * tf.math.log(1.0 - y_pred + epsilon)
    # c(y, y_hat) = -(y * log(y_hat) + (1 - y) * log(1 - y_hat))
    cross_entropy_loss = -(positive_predictions_cross_entropy + negative_predictions_cross_entropy)

    # Use the missing values mask for replacing loss values in places in which the label was missing with zeroes.
    # (y_true_not_nan_mask is a boolean which when casted to float will take values of 0.0 or 1.0)
    cross_entropy_loss_discarded_nan_labels = cross_entropy_loss * tf.cast(y_true_not_nan_mask, tf.float32)

    mean_loss_per_row = tf.reduce_mean(cross_entropy_loss_discarded_nan_labels, axis=1)
    mean_loss = tf.reduce_mean(mean_loss_per_row)

    return mean_loss


y_true = tf.constant([
    [0, 1, np.nan, 0],
    [0, 1, 1, 0],
    [np.nan, 1, np.nan, 0],
    [1, 1, 0, np.nan],
])

y_pred = tf.constant([
    [0.1, 0.7, 0.1, 0.3],
    [0.2, 0.6, 0.1, 0],
    [0.1, 0.9, 0.3, 0.2],
    [0.1, 0.4, 0.4, 0.2],
])

loss = weighted_cross_entropy_loss(y_true, y_pred)

# Extract value from EagerTensor.
print(loss.numpy())

outputs:

0.4945919

Use the loss function when compiling the keras model as specified in documentation:

model.compile(loss=missing_values_cross_entropy_loss, optimizer='sgd')
mjarosie
  • 3,228
  • 2
  • 20
  • 31