7

For each input I have, I have a 49x2 matrix associated. Here's what 1 input-output couple looks like

input :
[Car1, Car2, Car3 ..., Car118]

output :
[[Label1 Label2]
 [Label1 Label2]
      ...
 [Label1 Label2]]

Where both Label1 and Label2 are LabelEncode and they have respectively 1200 and 1300 different classes.

Just to make sure this is what we call a multi-output multi-class problem?

I tried to flatten the output but I feared the model wouldn't understand that all similar Label share the same classes.

Is there a Keras layer that handle output this peculiar array shape?

1 Answers1

4

Generally, multi-class problems correspond with models outputting a probability distribution over the set of classes (that is typically scored against the one-hot encoding of the actual class through cross-entropy). Now, independently of whether you are structuring it as one single output, two outputs, 49 outputs or 49 x 2 = 98 outputs, that would mean having 1,200 x 49 + 1,300 x 49 = 122,500 output units - which is not something a computer cannot handle, but maybe not the most convenient thing to have. You could try having each class output to be a single (e.g. linear) unit and round it's value to choose the label, but, unless the labels have some numerical meaning (e.g. order, sizes, etc.), that is not likely to work.

If the order of the elements in the input has some meaning (that is, shuffling it would affect the output), I think I'd approach the problem through an RNN, like an LSTM or a bidirectional LSTM model, with two outputs. Use return_sequences=True and TimeDistributed Dense softmax layers for the outputs, and for each 118-long input you'd have 118 pairs of outputs; then you can just use temporal sample weighting to drop, for example, the first 69 (or maybe do something like dropping the 35 first and the 34 last if you're using a bidirectional model) and compute the loss with the remaining 49 pairs of labellings. Or, if that makes sense for your data (maybe it doesn't), you could go with something more advanced like CTC (although Keras does not have it, I'm trying to integrate TensorFlow implementation into it without much sucess), which is also implemented in Keras (thanks @indraforyou)!.

If the order in the input has no meaning but the order of the outputs does, then you could have an RNN where your input is the original 118-long vector plus a pair of labels (each one-hot encoded), and the output is again a pair of labels (again two softmax layers). The idea would be that you get one "row" of the 49x2 output on each frame, and then you feed it back to the network along with the initial input to get the next one; at training time, you would have the input repeated 49 times along with the "previous" label (an empty label for the first one).

If there are no sequential relationships to exploit (i.e. the order of the input and the output do not have a special meaning), then the problem would only be truly represented by the initial 122,500 output units (plus all the hidden units you may need to get those right). You could also try some kind of middle ground between a regular network and a RNN where you have the two softmax outputs and, along with the 118-long vector, you include the "id" of the output that you want (e.g. as a 49-long one-hot encoded vector); if the "meaning" of each label at each of the 49 outputs is similar, or comparable, it may work.

jdehesa
  • 58,456
  • 7
  • 77
  • 121
  • 1
    There is Keras example using CTC loss .. check https://github.com/fchollet/keras/blob/master/examples/image_ocr.py .. both tensorflow and theano supported – indraforyou Jan 18 '17 at 13:17
  • @indraforyou WAAAT Keras *does* have CTC?! I think I read somewhere it didn't (probably about some old version) and didn't even check properly... shame on me! :S Thanks a lot for mentioning it. – jdehesa Jan 18 '17 at 13:33
  • Thanks! Each of my output set do have a inner hierarchy. My next milestone is to implement a variable number of line in each output set which would make this a variable multi-output multi-class problem. Do you think RNNs LSTM could handle that task too? – Julien Bélanger Jan 18 '17 at 14:34
  • @JulienBélanger I think RNNs may help, although if you do not know the length of the output in advance the tricky part, of course, would be to know when does the interesting output start and finish on prediction time. If you consider the input as a sequence, then the output would need to be at most as long as the input, and you may use something like CTC (although that is more something like "segmenting the input sequence and assigning a label to each segment", which I'm not sure if is what you want). If your input is not a sequence and you still use RNN, you need to work out when to stop! – jdehesa Jan 18 '17 at 14:47
  • @jdehesa no problem – indraforyou Jan 18 '17 at 18:15
  • @jdehesa Hi, i have a similar question that I thought you might have some interesting suggestions: https://stackoverflow.com/questions/62077273/how-to-perform-multiclass-multioutput-classification-using-lstm please let me know your thoughts. thank you :) – EmJ May 29 '20 at 01:58