Using SVM with different kernels as a last layer in CNN network

Question

I'm trying to replace the last fully connected layer of a CNN network with SVM using pytorch in a multi-classification problem. I've done some research and it says, that I should just replace the nn.CrossEntropyLoss with nn.MultiMarginLoss.

How does changing the criterion only actually corresponds with the "replacing fully connected layer with SVM" task? Another thing is that I'd like to use the SVM with different kernel, like for example the quadratic one.

score 0 · Accepted Answer · answered May 30 '20 at 08:18

This question can actually be interpreted as the difference between Logistic regression and SVM in classification.

We can naively look at the whole platform of your deep learning as if you have a magician, and that magician accepts the input data, and give you a set of engineered featured, and you use those features to do the classification.

Depending on which loss you minimize, you can solve this classification issue with different sorts of functions. If you use cross-entropy, it is like you are applying a logistic regression classification. On the other hand, if you minimize the marginal loss, it is actually equal to finding the support vectors, which is indeed how SVM works.

You need to read about the role of kernels in the calculation of the loss(for ex, here ), but TL;DR is that for loss computation, you have a component of K(xi,xj) which is actually the kernel function and indicate the similarity of xi and xj.

So you can implement a custom loss, where you have a polynomial kernel (quadratic in your case), and imitate the margin loss calculation there.

Using SVM with different kernels as a last layer in CNN network

1 Answers1