3

I have a layers neural net that does some stuff and I want a SVM at the end. I have googled and searched on stack exchange and it seems that it is easily implemented in keras using the loss function hinge or categorical_hinge. However, I am confused as to which one to use.

My examples is to be classifed into a binary class, either class 0 or class 1. So I can either do it via:

Method 1 https://github.com/keras-team/keras/issues/2588 (uses hinge) or How do I use categorical_hinge in Keras? (uses categorical_hinge):

Labels will be of shape (,2) with values of 0 or 1 indicating if it belongs to that class or not.

nb_classes = 2
model.add(Dense(nb_classes), W_regularizer=l2(0.01))
model.add(Activation('linear'))

model.compile(loss='hinge OR categorical_hinge ??,
              optimizer='adadelta',
              metrics=['accuracy'])

Then the class is the node that has a higher value of the two output node?

Method 2 https://github.com/keras-team/keras/issues/2830 (uses hinge):

The first commenter mentioned that hinge is supposed to be binary_hinge and that the labels must be -1 or 1 for no or yes, and that the activation for the last SVM layer should be tanh with 1 node only. So it should look something like this but the labels will be (,1) shape with values either -1 or 1.

model.add(Dense(1), W_regularizer=l2(0.01))
model.add(Activation('tanh'))

model.compile(loss='hinge',
              optimizer='adadelta',
              metrics=['accuracy'])

So which method is correct or more desirable? I am unsure of what to use since there are multiple answers online and the keras documentation contains nothing at all for the hinge and categorial_hinge loss functions. Thank you!

Lim Kaizhuo
  • 714
  • 3
  • 7
  • 16

1 Answers1

1

Might be a bit late but here is my answer.

You can do it in multiple ways:

Since you have 2 classes it is a binary problem and you can use the normal hinge. The architecture will then only have to put out 1 output -1 and one as you said.

You can use output 2 of the last layer also, you input just have to be one-hot encodings of the label and then use the categorical hinge.

According to the activation a linear layer and a tanh would both make an SVM the tanh will just be smoothed.

I would suggest making it binary and use a tanh layer, but try both things to see what works.

AndreasKS
  • 11
  • 1