I have a vanilla LSTM model which classifies the input data by outputting a probability distribution of 6 categories. Nothing too crazy.
Now, the model works and gives me an output, of which I take the max to categorise my input. However, I think we can do more.
Instead of the actual category, it is really useful for me to see the probability distribution being outputted by my LSTM; something like
[ 0.0528042 , 0.11904617, 0.27744624, 0.37874526, 0.13942425,
0.03253399]
as this information can tell me the second best guess, third, etc, and what the confidence was for the LSTM to label it as some category.
What's interesting is that my categories are very correlated; say, if category 1 and 2 correspond to 'really large value' and 'large value', I know my value is big, as opposed to categories 3 and 4, which are 'small value' and 'really small value'.
Is there any way to harness the fact that the categories are closely related in order to have a better output? Of course, I don't want to simply have two outputs ('large' and 'small'), as the probability distribution is really important for me.