What model (loss function, etc) can be used in Keras regarding to categorical training with probability labels instead of one-hot encoding

Question

I came to a problem when designing my keras model.

The training data(input) to the model is 2 sequential character-encoded lists and a non-sequential normal feature list. The output is a list of probabilities of 5 different classes. The testing data has the same features while the output is a single class label instead of probability. The task is to build a model learning from the training probability to predict the actual class on testing data.

For example, the data looks like

X_train, X_test = Sequential feature 1, Sequential feature 2, Non-sequential feature 3  
y_train = probability for class 1, probability for class 2 ... , probability for class 5  
y_test = 0/1, 0/1, ..., 0/1

X_train, X_test = [0, 0, 0, 11, 21, 1] + [ 0, 0, 0, 0, 0, 121, 1, 16] + [1, 0, 0.543, 0.764, 1, 0, 1]  
y_train = [0.132561  , 0.46975598, 0.132561  , 0.132561  , 0.132561]  
y_test = [0, 1, 0, 0, 0]

I have built two CNN model for sequential data, and a normal dense layer for non-sequential data, concat them into one-mixed model with some dense layers and dropouts. I used categorical_crossentropy as my loss function, while my input is not strictly one-hot encoding. Will that be a problem? Is there any suggestion to improve the model?

PS: taking the argmax of the training probability is not always telling the truth of actual label, say a list of probability

[0.33719498  , 0.46975598, 0.06434968  , 0.06434968  , 0.06434968]

the actual label could be

[1, 0, 0, 0, 0]

score 0 · Answer 1 · answered Mar 02 '19 at 22:20

0

Using probabilistic labels as ground truths seem not to be a good idea. We assume the data drawn from a fixed distribution. After being drawn, they are fixed events.

It seems to violate the assumption of the learning problems from a theoretical view.

I would suggest converting from probabilistic labels to one-hot labels and see if you experience an improvement.

answered Mar 02 '19 at 22:20

lin

1

I heard from somewhere that its reasonable to learn from non-deterministic(probabilistic) categorical labels. Are you saying such kind of problem is not reasonable? I came to this situation when I used probabilities to estimate labels instead of manually labeling them, and hope to learn from those probabilities to compare with golden dataset. – Jingwu Mar 02 '19 at 22:50
From a theoretical point of view for the statistical classification, it is not formal to use data with labels that are not deterministic, but, in practice, you can do anything that works. If I am in this situation and I have the resource, I will try both one-hot and nondeterministic labels and see how the validation says. – lin Mar 03 '19 at 00:52

What model (loss function, etc) can be used in Keras regarding to categorical training with probability labels instead of one-hot encoding

1 Answers1