0

I'm building a classifier for predicting labels -1 and 1. When I encode the labels by one hot encoder and use categorical cross entropy, I don't have any problems with learning.

model1.add(Dense(2, activation='softmax'))
model1.compile(loss='categorical_crossentropy', optimizer=optim, metrics=['accuracy'])

When I keep the labels without encoding and try to use sparse categorical cross entropy, the model loss is NaN.

model1.add(Dense(2, activation='softmax'))
model1.compile(loss='sparse_categorical_crossentropy', optimizer=optim, metrics=['accuracy'])

Epoch 1/3000
20748/20748 [==============================] - 2s 78us/sample - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00

When I encode the labels with ordinal encoder to be 0,1 instead -1,1 and train the model with sparse categorical entropy, the model doesn't show this problem with loss.

What is the reason for such a problem? I read that training with symmetrical labels around 0 is a problem but what is the explanation for that?

I know my Problem is binary classification but I wanted to test my model on binary data before using it for all labels

a_jelly_fish
  • 478
  • 6
  • 21
Hishi51
  • 55
  • 3
  • 9

3 Answers3

0

SparseCategorialCrossentropy

expect labels to be provided as integers

and using SparseCategoricalCrossentropy integer-tokens are converted to a one-hot-encoded label starting at 0. So it creates it, but it is not in your data. So having two classes you need to provide the labels as 0 and 1. And not -1 and 1. Therefore it is as you write, you can either:

  • Run it with one-hot encoding using Categorical Crossentropy or
  • Run it with integer labels 0 and 1 using Sparse Categorical Crossentropy,

but when

  • Running it with labels -1 and 1 using Sparse Categorical Crossentropy it will fail.
Stat Tistician
  • 813
  • 5
  • 17
  • 45
0

Another case I observed in my case was that the labels I passed were in float format. As the document reads you need to make sure this is an integer and not float. This fixed my issue

SecretAgent
  • 97
  • 10
0

Missing labels between 0-num_classes causes NAN for sparse_categorical_crossentropy.

A quick hack, if you would like to use sparse categorical entropy in these situations, add just one sample each in training and testing datasets for each missing labels.

For images, you can change/edit existing training/testing sample label to missing label and execute the fit function. You will start seeing loss instead of NAN.

Accordingly number of neurons in last/top layer also need to be changed.

jkr
  • 630
  • 1
  • 11
  • 24