I came to a problem when designing my keras model.
The training data(input) to the model is 2 sequential character-encoded lists and a non-sequential normal feature list. The output is a list of probabilities of 5 different classes. The testing data has the same features while the output is a single class label instead of probability. The task is to build a model learning from the training probability to predict the actual class on testing data.
For example, the data looks like
X_train, X_test = Sequential feature 1, Sequential feature 2, Non-sequential feature 3
y_train = probability for class 1, probability for class 2 ... , probability for class 5
y_test = 0/1, 0/1, ..., 0/1
X_train, X_test = [0, 0, 0, 11, 21, 1] + [ 0, 0, 0, 0, 0, 121, 1, 16] + [1, 0, 0.543, 0.764, 1, 0, 1]
y_train = [0.132561 , 0.46975598, 0.132561 , 0.132561 , 0.132561]
y_test = [0, 1, 0, 0, 0]
I have built two CNN model for sequential data, and a normal dense layer for non-sequential data, concat them into one-mixed model with some dense layers and dropouts. I used categorical_crossentropy as my loss function, while my input is not strictly one-hot encoding. Will that be a problem? Is there any suggestion to improve the model?
PS: taking the argmax of the training probability is not always telling the truth of actual label, say a list of probability
[0.33719498 , 0.46975598, 0.06434968 , 0.06434968 , 0.06434968]
the actual label could be
[1, 0, 0, 0, 0]