Why SGDClassifier with hinge loss is faster than SVC implementation in scikit-learn

Question

As we know For the support vector machine we can use SVC as well as SGDClassifier with hinge loss implementation. Is SGDClassifier with hinge loss implementation is faster than SVC. Why?

Links of both implementations of SVC in scikit-learn:
SVC
SGDClassifier

I read on the documentation page of the sci-kit learn that SVC uses some algorithm of libsvm library for optimization. While SGDClassifier uses SGD(obviously).

Yes the main problem is that SVC uses libsvm while SGDC uses liblinear instead, LinearSVC uses liblinear so the execution time will be much equal to SGDC — Noki, Feb 05 '20 at 07:59

Noki · Accepted Answer · 2020-02-05T09:20:10.493

Maybe it is better to start trying some practical cases and read the code. Let's start...

First of all, if we read the documentation of SGDC, it says the linear SVM is used only:

Linear classifiers (SVM, logistic regression, a.o.) with SGD training

What if instead of using the usual SVC, we use the LinearSVC?

Similar to SVC with parameter kernel=’linear’, but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples.

Let's add an example for the three types of algorithms:

from sklearn.svm import SVC
from sklearn.linear_model import SGDClassifier
from sklearn.svm import LinearSVC
from sklearn import datasets
import numpy as np

iris = datasets.load_iris()
X = np.random.rand(20000,2)

Y = np.random.choice(a=[False, True], size=(20000, 1))

# hinge is used as the default
svc = SVC(kernel='linear')

sgd = SGDClassifier(loss='hinge')

svcl = LinearSVC(loss='hinge')

Using jupyter and the command %%time we get the execution time (you can use similar ways in normal python, but this is how I did it):

%%time
svc.fit(X, Y)

Wall time: 5.61 s

%%time
sgd.fit(X, Y)

Wall time: 24ms

%%time
svcl.fit(X, Y)

Wall time: 26.5ms

As we can see there is a huge difference between all of them, but linear and SGDC have more or less the same time. The time keeps being a little bit different, but this will always happen since the execution of each algorithm does not come from the same code.

If you are interested in each implementation, I suggest you read the github code using the new github reading tool which is really good!

Code of linearSVC

Code of SGDC

score 1 · Answer 2 · answered Feb 04 '20 at 20:02

1

I think its because of the batch size used in SGD, if you use full batch with SGD classifier it should take same time as SVM but changing the batch size can lead to faster convergence.

answered Feb 04 '20 at 20:02

cerofrais

1,117
1
12
32

score 0 · Answer 3 · answered Feb 04 '20 at 19:20

0

The sklearn SVM is computationally expensive compared to sklearn SGD classifier with loss='hinge'. Hence we use SGD classifier which is faster. This is good only for linear SVM. If we are using 'rbf' kernel, then SGD is not suitable.

answered Feb 04 '20 at 19:20

bharat

1
1

Why SGDClassifier with hinge loss is faster than SVC implementation in scikit-learn

3 Answers3