I have a classification problem with multiple classes, let's call them A, B, C and D. My data has the following shape:
X=[#samples, #features, 1], y=[#samples,1].
To be more specific, the y looks like this:
[['A'], ['B'], ['D'], ['A'], ['C'], ...]
When I train a Random Forest classifier on these labels, this works fine, however I read multiple times that class labels also need to be one hot encoded. After the one hot encoding, y is
[[1,0,0,0], [0,1,0,0], ...]
and has the shape
[#samples, 4]
The problem arises when I try to use this as classifier input. The model predicts every one of the four labels individually, meaning that it is also able to produce an output like [0 0 0 0], which I don't want. rfc.classes_
returns
# [array([0, 1]), array([0, 1]), array([0, 1]), array([0, 1])]
How would I tell the model that the labels are one hot encoded instead of multiple labels which shall be predicted independently of each other? Do I need to change my y or do I need to alter some settings of the model?