1

I'm currently working on a text classification problem that needs us to classify text into one of four labels. After encoding y-value should be one of [0,1,2,3] which should be the predicted label.

However, the prediction this model made seems ranging in (0,1) and I'm a bit confused? Moreover, can anyone clarify if this is ANN or RNN? Have zero experience in TensorFlow and still struggling...

model = Sequential()
model.add(Dense(16, activation='relu'))
model.add(Dense(4, activation='softmax'))
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
              
from sklearn.preprocessing import LabelEncoder
#encode the label
label_encoder = LabelEncoder()
y_train=np.array(label_encoder.fit_transform(train_labels))
x_train=np.array(train_features)
y_true=np.array(label_encoder.fit_transform(dev_label))
#fit the model
model.fit(x_train,y_train,epochs=1)
y_pred=model.predict(dev_features)

and the error message:Classification metrics can't handle a mix of multiclass and continuous-multioutput targets

enter image description here

helen
  • 61
  • 1
  • 9

2 Answers2

5

Let's say that the target column has 4 unique values: red, blue, green, yellow and the corpus is converted to TF-IDF values. The first 3 rows look like this:

word_1 word_2 target
0.567 0.897 red
0.098 0.238 blue
0.66 0.786 green

One-Hot Encoding

After one-hot encoding the target, your target looks like an array of the form:

array[[1. 0. 0. 0.], <- category 'red'
[0. 1. 0. 0.], <- category 'blue'
[0. 0. 1. 0.]...] <- category 'green'

Here, the target column is of the size (n_samples, n_targets) which is (n,4). In such a case final activation has to be sigmoid or softmax and you will train your model with categorical_crossentropy loss. The code here answering your question will be:

model.add(Dense(4, activation='sigmoid'))
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

Label-Encoding

After label-encoding the target, your target looks like an array of the form:

array([1, 2, 3 ...])

with a 1D array of size (n_targets). Here the code will be:

model.add(Dense(4, activation='softmax'))
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

Prediction

These numbers you see are the probability of each class for the given input sample. For example, [[0.4846592 0.5153408]] means that the given sample belongs to class 0 with probability of around 0.48 and it belongs to class 1 with probability of around 0.51. So you want to take the class with the highest probability and therefore you can use np.argmax to find which index (i.e. 0 or 1) is the maximum one:

import numpy as np

pred_class = np.argmax(y_pred, axis=-1) 

Further, this has nothing to do with the loss function of the model. These probabilities are given by the last layer in your model which is very likely that it uses softmax as the activation function to normalize the output as a probability distribution. Source

Conclusion

  • The error you are getting is because of the loss function being used incorrectly.
  • If you have 1D integer encoded or LabelEncoded target, you should use sparse_categorical_crossentropy as loss function.
  • If you have one-hot encoded your target in order to have 2D shape (n_samples, n_class), you should use categorical_crossentropy
iamarchisha
  • 175
  • 7
  • Hi, the error message still the same as the question detail shown – helen May 04 '21 at 12:09
  • Could you show the code you used to one-hot encode the target? The error message essentially means that we have a shape difference. You are getting an 1D array but your your last layer has 4D output or something like that. A more detailed code would help me understand better where the shape difference lies. – iamarchisha May 04 '21 at 13:15
  • The training set with tfidf value is already given. So it was directly read into train_features and I just transformed it into np.array. The encoder just encodes the labels – helen May 04 '21 at 13:40
  • I have updated the answer. So basically there are 2 scenarios - a) one-hot encoded target and b) label-encoded target. The loss function for both varies. You can check more about it at : https://medium.com/deep-learning-with-keras/which-activation-loss-functions-in-multi-class-clasification-4cd599e4e61f – iamarchisha May 05 '21 at 06:07
  • Thanks so much for your detalied answer! However the prediction seems not in one of (0,1,2,3). I have put the prediction and my true labels in the question section. – helen May 07 '21 at 09:31
  • add this: `pred_class = np.argmax(y_pred, axis=-1) ` – iamarchisha May 07 '21 at 12:01
0

The dense layer should have a dimension 4 and the activation function should be "softmax" instead of "sigmoid" since we are performing multi-class (more than 2 classes) classification. Also, change the loss function to "categorical_crossentropy".

Your code sample will look like this:

model.add(Dense(16, activation='relu'))
model.add(Dense(4, activation='softmax'))
model.compile(optimizer='adam',
          loss='categorical_crossentropy',
          metrics=['accuracy'])
Kabilan Mohanraj
  • 1,856
  • 1
  • 7
  • 17
  • I did try that but the error goes like: ValueError: Shapes (None, 1) and (None, 4) are incompatible – helen May 04 '21 at 10:43
  • Can you post the code for the full model? – Kabilan Mohanraj May 04 '21 at 10:48
  • When we use softmax as the activation function, we [one-hot encode](https://www.tensorflow.org/api_docs/python/tf/keras/utils/to_categorical) the target variable. Can you try that? I'm sorry I missed that you had label encoded the target variable. – Kabilan Mohanraj May 04 '21 at 11:05
  • My feature values (train_features is a no_of_instance*no_of_vocab size list. And the values are tfdif values which are the importance of each word in my vocab.txt (given dictionary).For here i only onehot encoded the y values, not x(feature) values – helen May 04 '21 at 11:08
  • On the last layer please make sure to change the activation to "softmax". I see in the code that it is still "sigmoid". – Kabilan Mohanraj May 04 '21 at 11:38
  • sorry that's a typo.... I removed the first layer and change act to softmax, but the error still exists – helen May 04 '21 at 11:48
  • Check out this [link](https://www.kaggle.com/ismu94/tf-idf-deep-neural-net). It is similar to what you are trying to do. – Kabilan Mohanraj May 04 '21 at 13:24