0

I have a vanilla LSTM model which classifies the input data by outputting a probability distribution of 6 categories. Nothing too crazy.

Now, the model works and gives me an output, of which I take the max to categorise my input. However, I think we can do more.

Instead of the actual category, it is really useful for me to see the probability distribution being outputted by my LSTM; something like

[ 0.0528042 ,  0.11904617,  0.27744624,  0.37874526,  0.13942425,
         0.03253399]

as this information can tell me the second best guess, third, etc, and what the confidence was for the LSTM to label it as some category.

What's interesting is that my categories are very correlated; say, if category 1 and 2 correspond to 'really large value' and 'large value', I know my value is big, as opposed to categories 3 and 4, which are 'small value' and 'really small value'.

Is there any way to harness the fact that the categories are closely related in order to have a better output? Of course, I don't want to simply have two outputs ('large' and 'small'), as the probability distribution is really important for me.

Landmaster
  • 1,043
  • 2
  • 13
  • 21
  • no code, no data, please read how to ask a question and what constitutes a good question – gold_cy Aug 04 '17 at 02:02
  • Do you really need me to put in mock-data for you to get the idea? I can do that if it makes my writing clearer, but I don't see a need for it. Please let me know what data you want to see that clears up your confusions and I will provide you with it. – Landmaster Aug 04 '17 at 02:31

1 Answers1

2

Implementing a custom loss function will be required to encode the inter-class relationship.

Suppose that your 6 classes are sorted (say, ["extremely large", "very large", "large", "small", "very small", "extremely small"]), a suitable loss may be 1D-Wasserstein distance (a.k.a. earth mover's distance).

There's a closed form formula for one-dimensional EMD. For example, you can try to implement what has been described in this paper.

1D-EMD

Yu-Yang
  • 14,539
  • 2
  • 55
  • 62
  • Oh, I see I see! The loss function would then probably help the distribution, right? As in, it would most likely not give me 0.5 for extremely large and 0.5 for extremely small in the same output? Also, would this, then, replace the categorical cross entropy loss function? – Landmaster Aug 04 '17 at 05:22
  • Yes, it does so by penalizing `truth="extremely large", prediction="small"` more than `truth="extremely large", prediction="large"`, and yes, you can replace the categorical cross entropy loss with it. – Yu-Yang Aug 04 '17 at 05:41
  • Champion. Thanks, Yu! – Landmaster Aug 04 '17 at 05:43