This question can actually be interpreted as the difference between Logistic regression and SVM in classification.
We can naively look at the whole platform of your deep learning as if you have a magician, and that magician accepts the input data, and give you a set of engineered featured, and you use those features to do the classification.
Depending on which loss you minimize, you can solve this classification issue with different sorts of functions. If you use cross-entropy, it is like you are applying a logistic regression classification. On the other hand, if you minimize the marginal loss, it is actually equal to finding the support vectors, which is indeed how SVM works.
You need to read about the role of kernels in the calculation of the loss(for ex, here ), but TL;DR is that for loss computation, you have a component of K(xi,xj)
which is actually the kernel function and indicate the similarity of xi
and xj
.
So you can implement a custom loss, where you have a polynomial kernel (quadratic in your case), and imitate the margin loss calculation there.