Let's say that the target column has 4 unique values: red, blue, green, yellow
and the corpus is converted to TF-IDF values. The first 3 rows look like this:
word_1 |
word_2 |
target |
0.567 |
0.897 |
red |
0.098 |
0.238 |
blue |
0.66 |
0.786 |
green |
One-Hot Encoding
After one-hot encoding the target, your target looks like an array of the form:
array[[1. 0. 0. 0.], <- category 'red'
[0. 1. 0. 0.], <- category 'blue'
[0. 0. 1. 0.]...] <- category 'green'
Here, the target column is of the size (n_samples, n_targets) which is (n,4). In such a case final activation has to be sigmoid
or softmax
and you will train your model with categorical_crossentropy
loss. The code here answering your question will be:
model.add(Dense(4, activation='sigmoid'))
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
Label-Encoding
After label-encoding the target, your target looks like an array of the form:
array([1, 2, 3 ...])
with a 1D array of size (n_targets). Here the code will be:
model.add(Dense(4, activation='softmax'))
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
Prediction
These numbers you see are the probability of each class for the given input sample. For example, [[0.4846592 0.5153408]] means that the given sample belongs to class 0 with probability of around 0.48 and it belongs to class 1 with probability of around 0.51. So you want to take the class with the highest probability and therefore you can use np.argmax to find which index (i.e. 0 or 1) is the maximum one:
import numpy as np
pred_class = np.argmax(y_pred, axis=-1)
Further, this has nothing to do with the loss function of the model. These probabilities are given by the last layer in your model which is very likely that it uses softmax as the activation function to normalize the output as a probability distribution.
Source
Conclusion
- The error you are getting is because of the loss function being used incorrectly.
- If you have 1D integer encoded or LabelEncoded target, you should use
sparse_categorical_crossentropy
as loss function.
- If you have one-hot encoded your target in order to have 2D shape (n_samples, n_class), you should use
categorical_crossentropy